This project has retired. For details please refer to its Attic page.
Aggregate (Apache Crunch 0.11.0 API)

org.apache.crunch.lib
Class Aggregate

java.lang.Object
  extended by org.apache.crunch.lib.Aggregate

public class Aggregate
extends Object

Methods for performing various types of aggregations over PCollection instances.


Nested Class Summary
static class Aggregate.PairValueComparator<K,V>
           
static class Aggregate.TopKCombineFn<K,V>
           
static class Aggregate.TopKFn<K,V>
           
 
Constructor Summary
Aggregate()
           
 
Method Summary
static
<S> PCollection<S>
aggregate(PCollection<S> collect, Aggregator<S> aggregator)
           
static
<K,V> PTable<K,Collection<V>>
collectValues(PTable<K,V> collect)
           
static
<S> PTable<S,Long>
count(PCollection<S> collect)
          Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.
static
<S> PTable<S,Long>
count(PCollection<S> collect, int numPartitions)
          Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.
static
<S> PObject<Long>
length(PCollection<S> collect)
          Returns the number of elements in the provided PCollection.
static
<S> PObject<S>
max(PCollection<S> collect)
          Returns the largest numerical element from the input collection.
static
<S> PObject<S>
min(PCollection<S> collect)
          Returns the smallest numerical element from the input collection.
static
<K,V> PTable<K,V>
top(PTable<K,V> ptable, int limit, boolean maximize)
          Selects the top N pairs from the given table, with sorting being performed on the values (i.e.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Aggregate

public Aggregate()
Method Detail

count

public static <S> PTable<S,Long> count(PCollection<S> collect)
Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.


count

public static <S> PTable<S,Long> count(PCollection<S> collect,
                                       int numPartitions)
Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.


length

public static <S> PObject<Long> length(PCollection<S> collect)
Returns the number of elements in the provided PCollection.

Type Parameters:
S - The type of the PCollection.
Parameters:
collect - The PCollection whose elements should be counted.
Returns:
A PObject containing the number of elements in the PCollection.

top

public static <K,V> PTable<K,V> top(PTable<K,V> ptable,
                                    int limit,
                                    boolean maximize)
Selects the top N pairs from the given table, with sorting being performed on the values (i.e. the second value in the pair) of the table.

Parameters:
ptable - table containing the pairs from which the top N is to be selected
limit - number of top elements to select
maximize - if true, the maximum N values from the table will be selected, otherwise the minimal N values will be selected
Returns:
table containing the top N values from the incoming table

max

public static <S> PObject<S> max(PCollection<S> collect)
Returns the largest numerical element from the input collection.


min

public static <S> PObject<S> min(PCollection<S> collect)
Returns the smallest numerical element from the input collection.


collectValues

public static <K,V> PTable<K,Collection<V>> collectValues(PTable<K,V> collect)

aggregate

public static <S> PCollection<S> aggregate(PCollection<S> collect,
                                           Aggregator<S> aggregator)


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.