Package org.apache.crunch.lib

Joining, sorting, aggregating, and other commonly used functionality.

See:
          Description

Class Summary
Aggregate Methods for performing various types of aggregations over PCollection instances.
Aggregate.PairValueComparator<K,V>  
Aggregate.TopKCombineFn<K,V>  
Aggregate.TopKFn<K,V>  
Cartesian Utilities for Cartesian products of two PTable or PCollection instances.
Channels Utilities for splitting Pair instances emitted by DoFn into separate PCollection instances.
Cogroup  
Distinct Functions for computing the distinct elements of a PCollection.
Join Utilities for joining multiple PTable instances based on a common lastKey.
Mapred Static functions for working with legacy Mappers and Reducers that live under the org.apache.hadoop.mapred.* package as part of Crunch pipelines.
Mapreduce Static functions for working with legacy Mappers and Reducers that live under the org.apache.hadoop.mapreduce.* package as part of Crunch pipelines.
PTables Methods for performing common operations on PTables.
Sample Methods for performing random sampling in a distributed fashion, either by accepting each record in a PCollection with an independent probability in order to sample some fraction of the overall data set, or by using reservoir sampling in order to pull a uniform or weighted sample of fixed size from a PCollection of an unknown size.
SecondarySort Utilities for performing a secondary sort on a PTable<K, Pair<V1, V2>> collection.
Set Utilities for performing set operations (difference, intersection, etc) on PCollection instances.
Shard Utilities for controlling how the data in a PCollection is balanced across reducers and output files.
Sort Utilities for sorting PCollection instances.
Sort.ColumnOrder To sort by column 2 ascending then column 1 descending, you would use: sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING)) Column numbering is 1-based.
 

Enum Summary
Sort.Order For signaling the order in which a sort should be done.
 

Package org.apache.crunch.lib Description

Joining, sorting, aggregating, and other commonly used functionality.



Copyright © 2013 The Apache Software Foundation. All Rights Reserved.