T
- The value types to aggregatepublic interface Aggregator<T> extends Serializable
In most cases, an Aggregator will turn multiple values into a single value,
like creating a sum, finding the minimum or maximum, etc. In some cases
(ie. finding the top K elements), an implementation may return more than
one value. The Aggregators
utility class contains
factory methods for creating all kinds of pre-defined Aggregators that should
cover the most common cases.
Aggregator implementations should usually be associative and commutative, which makes their results deterministic. If your aggregation function isn't commutative, you can still use secondary sort to that effect.
The lifecycle of an Aggregator
always begins with you instantiating
it and passing it to Crunch. When running your Pipeline
, Crunch serializes
the instance and deserializes it wherever it is needed on the cluster. This is how
Crunch uses a deserialized instance:
initialize(Configuration)
oncereset()
update(Object)
multiple times until all values of a sequence
have been aggregatedresults()
to retrieve the aggregated resultModifier and Type | Method and Description |
---|---|
void |
initialize(org.apache.hadoop.conf.Configuration conf)
Perform any setup of this instance that is required prior to processing
inputs.
|
void |
reset()
Clears the internal state of this Aggregator and prepares it for the
values associated with the next key.
|
Iterable<T> |
results()
Returns the current aggregated state of this instance.
|
void |
update(T value)
Incorporate the given value into the aggregate state maintained by this
instance.
|
void initialize(org.apache.hadoop.conf.Configuration conf)
conf
- Hadoop configurationvoid reset()
void update(T value)
value
- The value to add to the aggregated stateCopyright © 2013 The Apache Software Foundation. All Rights Reserved.