Package | Description |
---|---|
org.apache.crunch |
Client-facing API and core abstractions.
|
org.apache.crunch.contrib.bloomfilter |
Support for creating Bloom Filters.
|
org.apache.crunch.contrib.text | |
org.apache.crunch.fn |
Commonly used functions for manipulating collections.
|
org.apache.crunch.impl.dist | |
org.apache.crunch.impl.dist.collect | |
org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
org.apache.crunch.impl.spark | |
org.apache.crunch.impl.spark.collect | |
org.apache.crunch.impl.spark.fn | |
org.apache.crunch.lambda |
Alternative Crunch API using Java 8 features to allow construction of pipelines using lambda functions and method
references.
|
org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
org.apache.crunch.lib.join |
Inner and outer joins on collections.
|
org.apache.crunch.types |
Common functionality for business object serialization.
|
org.apache.crunch.types.avro |
Business object serialization using Apache Avro.
|
org.apache.crunch.types.writable |
Business object serialization using Hadoop's Writables framework.
|
org.apache.crunch.util |
An assorted set of utilities.
|
Modifier and Type | Method and Description |
---|---|
static <T,U> Pair<T,U> |
Pair.of(T first,
U second) |
Modifier and Type | Method and Description |
---|---|
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTable.cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.
|
<U> PTable<K,Pair<V,U>> |
PTable.join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on
their common keys.
|
Modifier and Type | Method and Description |
---|---|
int |
Pair.compareTo(Pair<K,V> o) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
Pipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype)
Creates a
PTable containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
<K,V> PTable<K,V> |
Pipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options)
Creates a
PTable containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
PTable<K,V> |
PTable.filter(FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
PTable<K,V> |
PTable.filter(String name,
FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
<K,V> PTable<K,V> |
PCollection.parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
Modifier and Type | Method and Description |
---|---|
void |
BloomFilterFn.cleanup(Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter) |
void |
BloomFilterFn.process(S input,
Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter) |
Modifier and Type | Method and Description |
---|---|
static <K,V> Extractor<Pair<K,V>> |
Extractors.xpair(TokenizerFactory scannerFactory,
Extractor<K> one,
Extractor<V> two)
Returns an Extractor for pairs of the given types that uses the given
TokenizerFactory
for parsing the sub-fields. |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> . |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily . |
Modifier and Type | Method and Description |
---|---|
Pair<S,T> |
PairMapFn.map(Pair<K,V> input) |
Pair<V2,V1> |
SwapFn.map(Pair<V1,V2> input) |
Pair<K,V> |
SPairFunction.map(T input) |
Pair<K,V> |
ExtractKeyFn.map(V input) |
Modifier and Type | Method and Description |
---|---|
static <V1,V2> Aggregator<Pair<V1,V2>> |
Aggregators.pairAggregator(Aggregator<V1> a1,
Aggregator<V2> a2)
Apply separate aggregators to each component of a
Pair . |
static <V1,V2> PType<Pair<V2,V1>> |
SwapFn.ptype(PType<Pair<V1,V2>> pt) |
Modifier and Type | Method and Description |
---|---|
R |
SFunction2.map(Pair<K,V> input) |
Pair<S,T> |
PairMapFn.map(Pair<K,V> input) |
Pair<V2,V1> |
SwapFn.map(Pair<V1,V2> input) |
void |
SFlatMapFunction2.process(Pair<K,V> input,
Emitter<R> emitter) |
Modifier and Type | Method and Description |
---|---|
void |
PairMapFn.cleanup(Emitter<Pair<S,T>> emitter) |
void |
SPairFlatMapFunction.process(T input,
Emitter<Pair<K,V>> emitter) |
static <V1,V2> PType<Pair<V2,V1>> |
SwapFn.ptype(PType<Pair<V1,V2>> pt) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
DistributedPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype) |
<K,V> PTable<K,V> |
DistributedPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
Modifier and Type | Method and Description |
---|---|
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTableBase.cogroup(PTable<K,U> other) |
PType<Pair<K,V>> |
EmptyPTable.getPType() |
PType<Pair<K,V>> |
BaseUnionTable.getPType() |
PType<Pair<K,V>> |
BaseInputTable.getPType() |
PType<Pair<K,Iterable<V>>> |
BaseGroupedTable.getPType() |
PType<Pair<K,V>> |
BaseDoTable.getPType() |
<U> PTable<K,Pair<V,U>> |
PTableBase.join(PTable<K,U> other) |
Modifier and Type | Method and Description |
---|---|
<S,K,V> BaseDoTable<K,V> |
PCollectionFactory.createDoTable(String name,
PCollectionImpl<S> chainingCollection,
CombineFn<K,V> combineFn,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type) |
<S,K,V> BaseDoTable<K,V> |
PCollectionFactory.createDoTable(String name,
PCollectionImpl<S> chainingCollection,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type,
ParallelDoOptions options) |
PTable<K,V> |
PTableBase.filter(FilterFn<Pair<K,V>> filterFn) |
PTable<K,V> |
PTableBase.filter(String name,
FilterFn<Pair<K,V>> filterFn) |
<K,V> PTable<K,V> |
PCollectionImpl.parallelDo(DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type) |
<K,V> PTable<K,V> |
PCollectionImpl.parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type) |
<K,V> PTable<K,V> |
PCollectionImpl.parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type,
ParallelDoOptions options) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
MemPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype) |
<K,V> PTable<K,V> |
MemPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
static <S,T> PTable<S,T> |
MemPipeline.tableOf(Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
MemPipeline.typedTableOf(PTableType<S,T> ptype,
Iterable<Pair<S,T>> pairs) |
Modifier and Type | Method and Description |
---|---|
static <K,V> com.google.common.base.Function<Pair<K,V>,scala.Tuple2<K,V>> |
GuavaUtils.pair2tupleFunc() |
static <K,V> com.google.common.base.Function<scala.Tuple2<K,V>,Pair<K,V>> |
GuavaUtils.tuple2PairFunc() |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
SparkPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
Modifier and Type | Method and Description |
---|---|
PType<Pair<K,V>> |
CreatedTable.getPType() |
Modifier and Type | Method and Description |
---|---|
<S,K,V> BaseDoTable<K,V> |
SparkCollectFactory.createDoTable(String name,
PCollectionImpl<S> parent,
CombineFn<K,V> combineFn,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type) |
<S,K,V> BaseDoTable<K,V> |
SparkCollectFactory.createDoTable(String name,
PCollectionImpl<S> parent,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type,
ParallelDoOptions options) |
Constructor and Description |
---|
CreatedTable(SparkPipeline pipeline,
Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
Modifier and Type | Method and Description |
---|---|
Pair<K,Iterable<V>> |
ReduceInputFunction.call(scala.Tuple2<ByteArray,Iterable<byte[]>> kv) |
Modifier and Type | Method and Description |
---|---|
scala.Tuple2<S,Iterable<T>> |
PairMapIterableFunction.call(Pair<K,List<V>> input) |
scala.Tuple2<K,V> |
Tuple2MapFunction.call(Pair<K,V> p) |
scala.Tuple2<IntByteArray,byte[]> |
PartitionedMapOutputFunction.call(Pair<K,V> p) |
scala.Tuple2<ByteArray,byte[]> |
MapOutputFunction.call(Pair<K,V> p) |
Modifier and Type | Method and Description |
---|---|
Iterable<scala.Tuple2<K,V>> |
CrunchPairTuple2.call(Iterator<Pair<K,V>> iterator) |
Constructor and Description |
---|
FlatMapPairDoFn(DoFn<Pair<K,V>,T> fn,
SparkRuntimeContext ctxt) |
PairFlatMapDoFn(DoFn<T,Pair<K,V>> fn,
SparkRuntimeContext ctxt) |
PairMapFunction(MapFn<Pair<K,V>,S> fn,
SparkRuntimeContext ctxt) |
PairMapIterableFunction(MapFn<Pair<K,List<V>>,Pair<S,Iterable<T>>> fn,
SparkRuntimeContext runtimeContext) |
PairMapIterableFunction(MapFn<Pair<K,List<V>>,Pair<S,Iterable<T>>> fn,
SparkRuntimeContext runtimeContext) |
Tuple2MapFunction(MapFn<Pair<K,V>,Pair<K,V>> fn,
SparkRuntimeContext ctxt) |
Tuple2MapFunction(MapFn<Pair<K,V>,Pair<K,V>> fn,
SparkRuntimeContext ctxt) |
Modifier and Type | Method and Description |
---|---|
default <U> LTable<K,Pair<Collection<V>,Collection<U>>> |
LTable.cogroup(LTable<K,U> other)
Cogroup this table with another
LTable with the same key type, collecting the set of values from
each side. |
default <U> LTable<K,Pair<V,U>> |
LTable.join(LTable<K,U> other)
Inner join this table to another
LTable which has the same key type using a reduce-side join |
default <U> LTable<K,Pair<V,U>> |
LTable.join(LTable<K,U> other,
JoinType joinType)
Join this table to another
LTable which has the same key type using the provide JoinType and
the DefaultJoinStrategy (reduce-side join). |
default <U> LTable<K,Pair<V,U>> |
LTable.join(LTable<K,U> other,
JoinType joinType,
JoinStrategy<K,V,U> joinStrategy)
Join this table to another
LTable which has the same key type using the provided JoinType and
JoinStrategy |
Modifier and Type | Method and Description |
---|---|
default LTable<K,V> |
LTable.filter(SPredicate<Pair<K,V>> predicate)
Filter the rows of the table using the supplied predicate.
|
default <K,V> LTable<K,V> |
LCollection.filterMap(SFunction<S,Optional<Pair<K,V>>> fn,
PTableType<K,V> pType)
Combination of a filter and map operation by using a function with
Optional return type. |
default <K,V> LTable<K,V> |
LCollection.flatMap(SFunction<S,java.util.stream.Stream<Pair<K,V>>> fn,
PTableType<K,V> pType)
Map each element to zero or more output elements using the provided stream-returning function to yield an
LTable |
default LTable<K,V> |
LTable.incrementIf(Enum<?> counter,
SPredicate<Pair<K,V>> condition)
Increment a counter for every element satisfying the conditional predicate supplied.
|
default LTable<K,V> |
LTable.incrementIf(String counterGroup,
String counterName,
SPredicate<Pair<K,V>> condition)
Increment a counter for every element satisfying the conditional predicate supplied.
|
default <K,V> LTable<K,V> |
LCollection.map(SFunction<S,Pair<K,V>> fn,
PTableType<K,V> pType)
Map the elements of this collection 1-1 through the supplied function to yield an
LTable |
default <K,V> LTable<K,V> |
LCollection.parallelDo(DoFn<S,Pair<K,V>> fn,
PTableType<K,V> pType)
|
default <K,V> LTable<K,V> |
LCollection.parallelDo(LDoFn<S,Pair<K,V>> fn,
PTableType<K,V> pType)
Transform this LCollection using a Lambda-friendly
LDoFn . |
Modifier and Type | Method and Description |
---|---|
static <K,V> Pair<K,V> |
PTables.getDetachedValue(PTableType<K,V> tableType,
Pair<K,V> value)
Create a detached value for a table
Pair . |
static <K,V> Pair<K,Iterable<V>> |
PTables.getGroupedDetachedValue(PGroupedTableType<K,V> groupedTableType,
Pair<K,Iterable<V>> value)
Created a detached value for a
PGroupedTable value. |
static <T,U> Pair<PCollection<T>,PCollection<U>> |
Channels.split(PCollection<Pair<T,U>> pCollection)
Splits a
PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
static <T,U> Pair<PCollection<T>,PCollection<U>> |
Channels.split(PCollection<Pair<T,U>> pCollection,
PType<T> firstPType,
PType<U> secondPType)
Splits a
PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(int numReducers,
PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K,V,T> DoFn<Pair<K,Iterable<V>>,T> |
DoFns.detach(DoFn<Pair<K,Iterable<V>>,T> reduceFn,
PType<V> valueType)
"Reduce" DoFn wrapper which detaches the values in the iterable, preventing the unexpected behaviour related to
object reuse often observed when using Avro.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <T,N extends Number> |
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes)
The most general purpose of the weighted reservoir sampling patterns that allows us to choose
a random sample of elements for each of N input groups.
|
static <T,N extends Number> |
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes,
Long seed)
Same as the other groupedWeightedReservoirSample method, but include a seed for testing
purposes.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pair s using the specified column
ordering. |
static <X,Y> PTable<X,Collection<Pair<Long,Y>>> |
TopList.topNYbyX(PTable<X,Y> input,
int n)
Create a top-list of elements in the provided PTable, categorised by the key of the input table and using the count
of the value part of the input table.
|
Modifier and Type | Method and Description |
---|---|
int |
Aggregate.PairValueComparator.compare(Pair<K,V> left,
Pair<K,V> right) |
int |
Aggregate.PairValueComparator.compare(Pair<K,V> left,
Pair<K,V> right) |
static <K,V> Pair<K,V> |
PTables.getDetachedValue(PTableType<K,V> tableType,
Pair<K,V> value)
Create a detached value for a table
Pair . |
static <K,V> Pair<K,Iterable<V>> |
PTables.getGroupedDetachedValue(PGroupedTableType<K,V> groupedTableType,
Pair<K,Iterable<V>> value)
Created a detached value for a
PGroupedTable value. |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.process(Pair<K,V> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given
PCollection<Pair<K, V>> to a PTable<K, V> . |
void |
Aggregate.TopKFn.cleanup(Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.cleanup(Emitter<Pair<Integer,Pair<K,V>>> emitter) |
static <K,V,T> DoFn<Pair<K,Iterable<V>>,T> |
DoFns.detach(DoFn<Pair<K,Iterable<V>>,T> reduceFn,
PType<V> valueType)
"Reduce" DoFn wrapper which detaches the values in the iterable, preventing the unexpected behaviour related to
object reuse often observed when using Avro.
|
static <T,N extends Number> |
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes)
The most general purpose of the weighted reservoir sampling patterns that allows us to choose
a random sample of elements for each of N input groups.
|
static <T,N extends Number> |
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes,
Long seed)
Same as the other groupedWeightedReservoirSample method, but include a seed for testing
purposes.
|
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.process(Pair<K,V> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.process(Pair<K,V> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> , using
the given number of reducers. |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> , using
the given number of reducers. |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> , using
the given number of reducers. |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> , using
the given number of reducers. |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> , using
the given number of reducers. |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> , using
the given number of reducers. |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> , using
the given number of reducers. |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pair s using the specified column
ordering. |
static <T,U> Pair<PCollection<T>,PCollection<U>> |
Channels.split(PCollection<Pair<T,U>> pCollection)
Splits a
PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
static <T,U> Pair<PCollection<T>,PCollection<U>> |
Channels.split(PCollection<Pair<T,U>> pCollection,
PType<T> firstPType,
PType<U> secondPType)
Splits a
PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
static <T,N extends Number> |
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize)
Selects a weighted sample of the elements of the given
PCollection , where the second term in
the input Pair is a numerical weight. |
static <T,N extends Number> |
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize,
Long seed)
The weighted reservoir sampling function with the seed term exposed for testing purposes.
|
Constructor and Description |
---|
Result(long count,
Iterable<Pair<Double,V>> quantiles) |
TopKCombineFn(int limit,
boolean maximize,
PType<Pair<K,V>> pairType) |
TopKFn(int limit,
boolean ascending,
PType<Pair<K,V>> pairType) |
Modifier and Type | Method and Description |
---|---|
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn)
Perform a default join on the given
PTable instances using a user-specified JoinFn . |
PTable<K,Pair<U,V>> |
ShardedJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
MapsideJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
JoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type.
|
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
BloomFilterJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
Modifier and Type | Method and Description |
---|---|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
Modifier and Type | Method and Description |
---|---|
void |
LeftOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
LeftOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
FullOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
FullOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
RightOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
RightOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
RightOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
LeftOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
LeftOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
LeftOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
abstract void |
JoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
abstract void |
JoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
abstract void |
JoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
InnerJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter) |
void |
InnerJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter) |
void |
InnerJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter) |
void |
FullOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
FullOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
FullOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
static <K,U,V,T> PCollection<T> |
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype)
Performs a join on two tables, where the left table only contains a single
value per key.
|
static <K,U,V,T> PCollection<T> |
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype,
int numReducers)
Supports a user-specified number of reducers for the one-to-many join.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
Modifier and Type | Field and Description |
---|---|
static TupleFactory<Pair> |
TupleFactory.PAIR |
Modifier and Type | Method and Description |
---|---|
Pair<K,Iterable<V>> |
PGroupedTableType.PairIterableMapFn.map(Pair<Object,Iterable<Object>> input) |
Modifier and Type | Method and Description |
---|---|
ReadableSourceTarget<Pair<K,Iterable<V>>> |
PGroupedTableType.getDefaultFileSource(org.apache.hadoop.fs.Path path) |
<V1,V2> PType<Pair<V1,V2>> |
PTypeFamily.pairs(PType<V1> p1,
PType<V2> p2) |
Modifier and Type | Method and Description |
---|---|
Pair<K,Iterable<V>> |
PGroupedTableType.PairIterableMapFn.map(Pair<Object,Iterable<Object>> input) |
Modifier and Type | Method and Description |
---|---|
<V1,V2> PType<Pair<V1,V2>> |
AvroTypeFamily.pairs(PType<V1> p1,
PType<V2> p2) |
static <V1,V2> AvroType<Pair<V1,V2>> |
Avros.pairs(PType<V1> p1,
PType<V2> p2) |
Modifier and Type | Method and Description |
---|---|
<V1,V2> PType<Pair<V1,V2>> |
WritableTypeFamily.pairs(PType<V1> p1,
PType<V2> p2) |
static <V1,V2> WritableType<Pair<V1,V2>,TupleWritable> |
Writables.pairs(PType<V1> p1,
PType<V2> p2) |
Modifier and Type | Method and Description |
---|---|
Iterator<Pair<S,T>> |
Tuples.PairIterable.iterator() |
Copyright © 2016 The Apache Software Foundation. All rights reserved.