This project has retired. For details please refer to its Attic page.
Aggregators (Apache Crunch 0.9.0 API)

org.apache.crunch.fn
Class Aggregators

java.lang.Object
  extended by org.apache.crunch.fn.Aggregators

public final class Aggregators
extends Object

A collection of pre-defined Aggregators.

The factory methods of this class return Aggregator instances that you can use to combine the values of a PGroupedTable. In most cases, they turn a multimap (multiple entries per key) into a map (one entry per key).

Note: When using composed aggregators, like those built by the pairAggregator() factory method, you typically don't want to put in the same child aggregator more than once, even if all child aggregators have the same type. In most cases, this is what you want:

   PTable<K, Long> result = groupedTable.combineValues(
      pairAggregator(SUM_LONGS(), SUM_LONGS())
   );
 


Nested Class Summary
static class Aggregators.SimpleAggregator<T>
          Base class for aggregators that do not require any initialization.
 
Method Summary
static
<V> Aggregator<V>
FIRST_N(int n)
          Return the first n values (or fewer if there are fewer values than n).
static
<V> Aggregator<V>
LAST_N(int n)
          Return the last n values (or fewer if there are fewer values than n).
static Aggregator<BigInteger> MAX_BIGINTS()
          Return the maximum of all given BigInteger values.
static Aggregator<BigInteger> MAX_BIGINTS(int n)
          Return the n largest BigInteger values (or fewer if there are fewer values than n).
static Aggregator<Double> MAX_DOUBLES()
          Return the maximum of all given double values.
static Aggregator<Double> MAX_DOUBLES(int n)
          Return the n largest double values (or fewer if there are fewer values than n).
static Aggregator<Float> MAX_FLOATS()
          Return the maximum of all given float values.
static Aggregator<Float> MAX_FLOATS(int n)
          Return the n largest float values (or fewer if there are fewer values than n).
static Aggregator<Integer> MAX_INTS()
          Return the maximum of all given int values.
static Aggregator<Integer> MAX_INTS(int n)
          Return the n largest int values (or fewer if there are fewer values than n).
static Aggregator<Long> MAX_LONGS()
          Return the maximum of all given long values.
static Aggregator<Long> MAX_LONGS(int n)
          Return the n largest long values (or fewer if there are fewer values than n).
static
<V extends Comparable<V>>
Aggregator<V>
MAX_N(int n, Class<V> cls)
          Return the n largest values (or fewer if there are fewer values than n).
static Aggregator<BigInteger> MIN_BIGINTS()
          Return the minimum of all given BigInteger values.
static Aggregator<BigInteger> MIN_BIGINTS(int n)
          Return the n smallest BigInteger values (or fewer if there are fewer values than n).
static Aggregator<Double> MIN_DOUBLES()
          Return the minimum of all given double values.
static Aggregator<Double> MIN_DOUBLES(int n)
          Return the n smallest double values (or fewer if there are fewer values than n).
static Aggregator<Float> MIN_FLOATS()
          Return the minimum of all given float values.
static Aggregator<Float> MIN_FLOATS(int n)
          Return the n smallest float values (or fewer if there are fewer values than n).
static Aggregator<Integer> MIN_INTS()
          Return the minimum of all given int values.
static Aggregator<Integer> MIN_INTS(int n)
          Return the n smallest int values (or fewer if there are fewer values than n).
static Aggregator<Long> MIN_LONGS()
          Return the minimum of all given long values.
static Aggregator<Long> MIN_LONGS(int n)
          Return the n smallest long values (or fewer if there are fewer values than n).
static
<V extends Comparable<V>>
Aggregator<V>
MIN_N(int n, Class<V> cls)
          Return the n smallest values (or fewer if there are fewer values than n).
static
<V1,V2> Aggregator<Pair<V1,V2>>
pairAggregator(Aggregator<V1> a1, Aggregator<V2> a2)
          Apply separate aggregators to each component of a Pair.
static
<V1,V2,V3,V4>
Aggregator<Tuple4<V1,V2,V3,V4>>
quadAggregator(Aggregator<V1> a1, Aggregator<V2> a2, Aggregator<V3> a3, Aggregator<V4> a4)
          Apply separate aggregators to each component of a Tuple4.
static
<V> Aggregator<V>
SAMPLE_UNIQUE_ELEMENTS(int maximumSampleSize)
          Collect a sample of unique elements from the input, where 'unique' is defined by the equals method for the input objects.
static Aggregator<String> STRING_CONCAT(String separator, boolean skipNull)
          Concatenate strings, with a separator between strings.
static Aggregator<String> STRING_CONCAT(String separator, boolean skipNull, long maxOutputLength, long maxInputLength)
          Concatenate strings, with a separator between strings.
static Aggregator<BigInteger> SUM_BIGINTS()
          Sum up all BigInteger values.
static Aggregator<Double> SUM_DOUBLES()
          Sum up all double values.
static Aggregator<Float> SUM_FLOATS()
          Sum up all float values.
static Aggregator<Integer> SUM_INTS()
          Sum up all int values.
static Aggregator<Long> SUM_LONGS()
          Sum up all long values.
static
<K,V> CombineFn<K,V>
toCombineFn(Aggregator<V> aggregator)
          Wrap a CombineFn adapter around the given aggregator.
static
<V1,V2,V3> Aggregator<Tuple3<V1,V2,V3>>
tripAggregator(Aggregator<V1> a1, Aggregator<V2> a2, Aggregator<V3> a3)
          Apply separate aggregators to each component of a Tuple3.
static Aggregator<TupleN> tupleAggregator(Aggregator<?>... aggregators)
          Apply separate aggregators to each component of a Tuple.
static
<V> Aggregator<V>
UNIQUE_ELEMENTS()
          Collect the unique elements of the input, as defined by the equals method for the input objects.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

SUM_LONGS

public static Aggregator<Long> SUM_LONGS()
Sum up all long values.

Returns:
The newly constructed instance

SUM_INTS

public static Aggregator<Integer> SUM_INTS()
Sum up all int values.

Returns:
The newly constructed instance

SUM_FLOATS

public static Aggregator<Float> SUM_FLOATS()
Sum up all float values.

Returns:
The newly constructed instance

SUM_DOUBLES

public static Aggregator<Double> SUM_DOUBLES()
Sum up all double values.

Returns:
The newly constructed instance

SUM_BIGINTS

public static Aggregator<BigInteger> SUM_BIGINTS()
Sum up all BigInteger values.

Returns:
The newly constructed instance

MAX_LONGS

public static Aggregator<Long> MAX_LONGS()
Return the maximum of all given long values.

Returns:
The newly constructed instance

MAX_LONGS

public static Aggregator<Long> MAX_LONGS(int n)
Return the n largest long values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MAX_INTS

public static Aggregator<Integer> MAX_INTS()
Return the maximum of all given int values.

Returns:
The newly constructed instance

MAX_INTS

public static Aggregator<Integer> MAX_INTS(int n)
Return the n largest int values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MAX_FLOATS

public static Aggregator<Float> MAX_FLOATS()
Return the maximum of all given float values.

Returns:
The newly constructed instance

MAX_FLOATS

public static Aggregator<Float> MAX_FLOATS(int n)
Return the n largest float values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MAX_DOUBLES

public static Aggregator<Double> MAX_DOUBLES()
Return the maximum of all given double values.

Returns:
The newly constructed instance

MAX_DOUBLES

public static Aggregator<Double> MAX_DOUBLES(int n)
Return the n largest double values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MAX_BIGINTS

public static Aggregator<BigInteger> MAX_BIGINTS()
Return the maximum of all given BigInteger values.

Returns:
The newly constructed instance

MAX_BIGINTS

public static Aggregator<BigInteger> MAX_BIGINTS(int n)
Return the n largest BigInteger values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MAX_N

public static <V extends Comparable<V>> Aggregator<V> MAX_N(int n,
                                                            Class<V> cls)
Return the n largest values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
cls - The type of the values to aggregate (must implement Comparable!)
Returns:
The newly constructed instance

MIN_LONGS

public static Aggregator<Long> MIN_LONGS()
Return the minimum of all given long values.

Returns:
The newly constructed instance

MIN_LONGS

public static Aggregator<Long> MIN_LONGS(int n)
Return the n smallest long values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MIN_INTS

public static Aggregator<Integer> MIN_INTS()
Return the minimum of all given int values.

Returns:
The newly constructed instance

MIN_INTS

public static Aggregator<Integer> MIN_INTS(int n)
Return the n smallest int values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MIN_FLOATS

public static Aggregator<Float> MIN_FLOATS()
Return the minimum of all given float values.

Returns:
The newly constructed instance

MIN_FLOATS

public static Aggregator<Float> MIN_FLOATS(int n)
Return the n smallest float values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MIN_DOUBLES

public static Aggregator<Double> MIN_DOUBLES()
Return the minimum of all given double values.

Returns:
The newly constructed instance

MIN_DOUBLES

public static Aggregator<Double> MIN_DOUBLES(int n)
Return the n smallest double values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MIN_BIGINTS

public static Aggregator<BigInteger> MIN_BIGINTS()
Return the minimum of all given BigInteger values.

Returns:
The newly constructed instance

MIN_BIGINTS

public static Aggregator<BigInteger> MIN_BIGINTS(int n)
Return the n smallest BigInteger values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

MIN_N

public static <V extends Comparable<V>> Aggregator<V> MIN_N(int n,
                                                            Class<V> cls)
Return the n smallest values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
cls - The type of the values to aggregate (must implement Comparable!)
Returns:
The newly constructed instance

FIRST_N

public static <V> Aggregator<V> FIRST_N(int n)
Return the first n values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

LAST_N

public static <V> Aggregator<V> LAST_N(int n)
Return the last n values (or fewer if there are fewer values than n).

Parameters:
n - The number of values to return
Returns:
The newly constructed instance

STRING_CONCAT

public static Aggregator<String> STRING_CONCAT(String separator,
                                               boolean skipNull)
Concatenate strings, with a separator between strings. There is no limits of length for the concatenated string.

Note: String concatenation is not commutative, which means the result of the aggregation is not deterministic!

Parameters:
separator - the separator which will be appended between each string
skipNull - define if we should skip null values. Throw NullPointerException if set to false and there is a null value.
Returns:
The newly constructed instance

STRING_CONCAT

public static Aggregator<String> STRING_CONCAT(String separator,
                                               boolean skipNull,
                                               long maxOutputLength,
                                               long maxInputLength)
Concatenate strings, with a separator between strings. You can specify the maximum length of the output string and of the input strings, if they are > 0. If a value is <= 0, there is no limit.

Any too large string (or any string which would made the output too large) will be silently discarded.

Note: String concatenation is not commutative, which means the result of the aggregation is not deterministic!

Parameters:
separator - the separator which will be appended between each string
skipNull - define if we should skip null values. Throw NullPointerException if set to false and there is a null value.
maxOutputLength - the maximum length of the output string. If it's set <= 0, there is no limit. The number of characters of the output string will be < maxOutputLength.
maxInputLength - the maximum length of the input strings. If it's set <= 0, there is no limit. The number of characters of the input string will be < maxInputLength to be concatenated.
Returns:
The newly constructed instance

UNIQUE_ELEMENTS

public static <V> Aggregator<V> UNIQUE_ELEMENTS()
Collect the unique elements of the input, as defined by the equals method for the input objects. No guarantees are made about the order in which the final elements will be returned.

Returns:
The newly constructed instance

SAMPLE_UNIQUE_ELEMENTS

public static <V> Aggregator<V> SAMPLE_UNIQUE_ELEMENTS(int maximumSampleSize)
Collect a sample of unique elements from the input, where 'unique' is defined by the equals method for the input objects. No guarantees are made about which elements will be returned, simply that there will not be any more than the given sample size for any key.

Parameters:
maximumSampleSize - The maximum number of unique elements to return per key
Returns:
The newly constructed instance

pairAggregator

public static <V1,V2> Aggregator<Pair<V1,V2>> pairAggregator(Aggregator<V1> a1,
                                                             Aggregator<V2> a2)
Apply separate aggregators to each component of a Pair.


tripAggregator

public static <V1,V2,V3> Aggregator<Tuple3<V1,V2,V3>> tripAggregator(Aggregator<V1> a1,
                                                                     Aggregator<V2> a2,
                                                                     Aggregator<V3> a3)
Apply separate aggregators to each component of a Tuple3.


quadAggregator

public static <V1,V2,V3,V4> Aggregator<Tuple4<V1,V2,V3,V4>> quadAggregator(Aggregator<V1> a1,
                                                                           Aggregator<V2> a2,
                                                                           Aggregator<V3> a3,
                                                                           Aggregator<V4> a4)
Apply separate aggregators to each component of a Tuple4.


tupleAggregator

public static Aggregator<TupleN> tupleAggregator(Aggregator<?>... aggregators)
Apply separate aggregators to each component of a Tuple.


toCombineFn

public static final <K,V> CombineFn<K,V> toCombineFn(Aggregator<V> aggregator)
Wrap a CombineFn adapter around the given aggregator.

Parameters:
aggregator - The instance to wrap
Returns:
A CombineFn delegating to aggregator


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.