See: Description
Class | Description |
---|---|
Aggregate |
Methods for performing various types of aggregations over
PCollection instances. |
Aggregate.PairValueComparator<K,V> | |
Aggregate.TopKCombineFn<K,V> | |
Aggregate.TopKFn<K,V> | |
Average | |
Cartesian |
Utilities for Cartesian products of two
PTable or PCollection
instances. |
Channels | |
Cogroup | |
Distinct |
Functions for computing the distinct elements of a
PCollection . |
DoFns | |
Join |
Utilities for joining multiple
PTable instances based on a common
lastKey. |
Mapred |
Static functions for working with legacy Mappers and Reducers that live under the org.apache.hadoop.mapred.*
package as part of Crunch pipelines.
|
Mapreduce |
Static functions for working with legacy Mappers and Reducers that live under the org.apache.hadoop.mapreduce.*
package as part of Crunch pipelines.
|
PTables |
Methods for performing common operations on PTables.
|
Quantiles | |
Quantiles.Result<V> |
Output type for storing the results of a Quantiles computation
|
Sample |
Methods for performing random sampling in a distributed fashion, either by accepting each
record in a
PCollection with an independent probability in order to sample some
fraction of the overall data set, or by using reservoir sampling in order to pull a uniform
or weighted sample of fixed size from a PCollection of an unknown size. |
SecondarySort |
Utilities for performing a secondary sort on a
PTable<K, Pair<V1, V2>> collection. |
Set |
Utilities for performing set operations (difference, intersection, etc) on
PCollection instances. |
Shard |
Utilities for controlling how the data in a
PCollection is balanced across reducers
and output files. |
Sort |
Utilities for sorting
PCollection instances. |
Sort.ColumnOrder |
To sort by column 2 ascending then column 1 descending, you would use:
sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING))
Column numbering is 1-based. |
TopList |
Tools for creating top lists of items in PTables and PCollections
|
Enum | Description |
---|---|
Sort.Order |
For signaling the order in which a sort should be done.
|
Copyright © 2016 The Apache Software Foundation. All rights reserved.