T- The value types to aggregate
public interface Aggregator<T> extends Serializable
In most cases, an Aggregator will turn multiple values into a single value,
like creating a sum, finding the minimum or maximum, etc. In some cases
(ie. finding the top K elements), an implementation may return more than
one value. The
Aggregators utility class contains
factory methods for creating all kinds of pre-defined Aggregators that should
cover the most common cases.
Aggregator implementations should usually be associative and commutative, which makes their results deterministic. If your aggregation function isn't commutative, you can still use secondary sort to that effect.
The lifecycle of an
Aggregator always begins with you instantiating
it and passing it to Crunch. When running your
Pipeline, Crunch serializes
the instance and deserializes it wherever it is needed on the cluster. This is how
Crunch uses a deserialized instance:
|Modifier and Type||Method and Description|
Perform any setup of this instance that is required prior to processing inputs.
Clears the internal state of this Aggregator and prepares it for the values associated with the next key.
Returns the current aggregated state of this instance.
Incorporate the given value into the aggregate state maintained by this instance.
void initialize(org.apache.hadoop.conf.Configuration conf)
conf- Hadoop configuration
void update(T value)
value- The value to add to the aggregated state
Copyright © 2017 The Apache Software Foundation. All rights reserved.