Aggregator (Apache Crunch 0.15.0 API)

This project has retired. For details please refer to its Attic page.

Aggregator (Apache Crunch 0.15.0 API)

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Type Parameters:

T - The value types to aggregate

All Superinterfaces:

Serializable

All Known Implementing Classes:

Aggregators.SimpleAggregator, LAggregator
```
public interface Aggregator<T>
extends Serializable
```
Aggregate a sequence of values into a possibly smaller sequence of the same type.
In most cases, an Aggregator will turn multiple values into a single value, like creating a sum, finding the minimum or maximum, etc. In some cases (ie. finding the top K elements), an implementation may return more than one value. The Aggregators utility class contains factory methods for creating all kinds of pre-defined Aggregators that should cover the most common cases.

Aggregator implementations should usually be associative and commutative, which makes their results deterministic. If your aggregation function isn't commutative, you can still use secondary sort to that effect.

The lifecycle of an Aggregator always begins with you instantiating it and passing it to Crunch. When running your Pipeline, Crunch serializes the instance and deserializes it wherever it is needed on the cluster. This is how Crunch uses a deserialized instance:
1. call initialize(Configuration) once
2. call reset()
3. call update(Object) multiple times until all values of a sequence have been aggregated
4. call results() to retrieve the aggregated result
5. go back to step 2 until all sequences have been aggregated

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`void`	`initialize(org.apache.hadoop.conf.Configuration conf)` Perform any setup of this instance that is required prior to processing inputs.
`void`	`reset()` Clears the internal state of this Aggregator and prepares it for the values associated with the next key.
`Iterable<T>`	`results()` Returns the current aggregated state of this instance.
`void`	`update(T value)` Incorporate the given value into the aggregate state maintained by this instance.

- Method Detail
  - initialize
```
void initialize(org.apache.hadoop.conf.Configuration conf)
```
    Perform any setup of this instance that is required prior to processing inputs.
    
    Parameters:
    
    conf - Hadoop configuration
  - reset
```
void reset()
```
    Clears the internal state of this Aggregator and prepares it for the values associated with the next key. Depending on what you aggregate, this typically means setting a variable to zero or clearing a list. Failing to do this will yield wrong results!
  - update
```
void update(T value)
```
    Incorporate the given value into the aggregate state maintained by this instance.
    
    Parameters:
    
    value - The value to add to the aggregated state
  - results
```
Iterable<T> results()
```
    Returns the current aggregated state of this instance.

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2017 The Apache Software Foundation. All rights reserved.