This project has retired. For details please refer to its Attic page.
org.apache.crunch.lib (Apache Crunch 0.6.0 API)

Package org.apache.crunch.lib

Joining, sorting, aggregating, and other commonly used functionality.

See:
          Description

Class Summary
Aggregate Methods for performing various types of aggregations over PCollection instances.
Aggregate.PairValueComparator<K,V>  
Aggregate.TopKCombineFn<K,V>  
Aggregate.TopKFn<K,V>  
Cartesian Utilities for Cartesian products of two PTable or PCollection instances.
Cogroup  
Distinct Functions for computing the distinct elements of a PCollection.
Join Utilities for joining multiple PTable instances based on a common lastKey.
PTables Methods for performing common operations on PTables.
Sample Methods for performing random sampling in a distributed fashion, either by accepting each record in a PCollection with an independent probability in order to sample some fraction of the overall data set, or by using reservoir sampling in order to pull a uniform or weighted sample of fixed size from a PCollection of an unknown size.
SecondarySort Utilities for performing a secondary sort on a PTable<K, Pair<V1, V2>> collection.
Set Utilities for performing set operations (difference, intersection, etc) on PCollection instances.
Sort Utilities for sorting PCollection instances.
Sort.ColumnOrder To sort by column 2 ascending then column 1 descending, you would use: sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING)) Column numbering is 1-based.
 

Enum Summary
Sort.Order For signaling the order in which a sort should be done.
 

Package org.apache.crunch.lib Description

Joining, sorting, aggregating, and other commonly used functionality.



Copyright © 2013 The Apache Software Foundation. All Rights Reserved.