Package | Description |
---|---|
org.apache.crunch |
Client-facing API and core abstractions.
|
org.apache.crunch.contrib.bloomfilter |
Support for creating Bloom Filters.
|
org.apache.crunch.contrib.text | |
org.apache.crunch.fn |
Commonly used functions for manipulating collections.
|
org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
org.apache.crunch.lib.join |
Inner and outer joins on collections.
|
org.apache.crunch.types |
Common functionality for business object serialization.
|
org.apache.crunch.types.avro |
Business object serialization using Apache Avro.
|
org.apache.crunch.types.writable |
Business object serialization using Hadoop's Writables framework.
|
org.apache.crunch.util |
An assorted set of utilities.
|
Modifier and Type | Method and Description |
---|---|
static <T,U> Pair<T,U> |
Pair.of(T first,
U second) |
Modifier and Type | Method and Description |
---|---|
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTable.cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.
|
<U> PTable<K,Pair<V,U>> |
PTable.join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on
their common keys.
|
static <K,V1,V2> CombineFn<K,Pair<V1,V2>> |
CombineFn.pairAggregator(CombineFn.AggregatorFactory<V1> a1,
CombineFn.AggregatorFactory<V2> a2)
Deprecated.
|
Iterable<Pair<V1,V2>> |
CombineFn.PairAggregator.results()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
int |
Pair.compareTo(Pair<K,V> o) |
void |
CombineFn.AggregatorCombineFn.process(Pair<K,Iterable<V>> input,
Emitter<Pair<K,V>> emitter)
Deprecated.
|
void |
CombineFn.PairAggregator.update(Pair<V1,V2> value)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
PTable<K,V> |
PTable.filter(FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
PTable<K,V> |
PTable.filter(String name,
FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
<K,V> PTable<K,V> |
PCollection.parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
void |
CombineFn.AggregatorCombineFn.process(Pair<K,Iterable<V>> input,
Emitter<Pair<K,V>> emitter)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
BloomFilterFn.cleanup(Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter) |
void |
BloomFilterFn.process(S input,
Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter) |
Modifier and Type | Method and Description |
---|---|
static <K,V> Extractor<Pair<K,V>> |
Extractors.xpair(TokenizerFactory scannerFactory,
Extractor<K> one,
Extractor<V> two)
Returns an Extractor for pairs of the given types that uses the given
TokenizerFactory
for parsing the sub-fields. |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> . |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily . |
Modifier and Type | Method and Description |
---|---|
Pair<S,T> |
PairMapFn.map(Pair<K,V> input) |
Pair<K,V> |
ExtractKeyFn.map(V input) |
Modifier and Type | Method and Description |
---|---|
static <V1,V2> Aggregator<Pair<V1,V2>> |
Aggregators.pairAggregator(Aggregator<V1> a1,
Aggregator<V2> a2)
Apply separate aggregators to each component of a
Pair . |
Modifier and Type | Method and Description |
---|---|
Pair<S,T> |
PairMapFn.map(Pair<K,V> input) |
void |
MapKeysFn.process(Pair<K1,V> input,
Emitter<Pair<K2,V>> emitter) |
void |
MapValuesFn.process(Pair<K,V1> input,
Emitter<Pair<K,V2>> emitter) |
Modifier and Type | Method and Description |
---|---|
void |
PairMapFn.cleanup(Emitter<Pair<S,T>> emitter) |
void |
MapKeysFn.process(Pair<K1,V> input,
Emitter<Pair<K2,V>> emitter) |
void |
MapValuesFn.process(Pair<K,V1> input,
Emitter<Pair<K,V2>> emitter) |
Modifier and Type | Method and Description |
---|---|
static <S,T> PTable<S,T> |
MemPipeline.tableOf(Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
MemPipeline.typedTableOf(PTableType<S,T> ptype,
Iterable<Pair<S,T>> pairs) |
Modifier and Type | Method and Description |
---|---|
static <K,V> Pair<K,V> |
PTables.getDetachedValue(PTableType<K,V> tableType,
Pair<K,V> value)
Create a detached value for a table
Pair . |
static <K,V> Pair<K,Iterable<V>> |
PTables.getGroupedDetachedValue(PGroupedTableType<K,V> groupedTableType,
Pair<K,Iterable<V>> value)
Created a detached value for a
PGroupedTable value. |
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn) |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pair s using the specified column
ordering. |
Modifier and Type | Method and Description |
---|---|
int |
Aggregate.PairValueComparator.compare(Pair<K,V> left,
Pair<K,V> right) |
int |
Aggregate.PairValueComparator.compare(Pair<K,V> left,
Pair<K,V> right) |
static <K,V> Pair<K,V> |
PTables.getDetachedValue(PTableType<K,V> tableType,
Pair<K,V> value)
Create a detached value for a table
Pair . |
static <K,V> Pair<K,Iterable<V>> |
PTables.getGroupedDetachedValue(PGroupedTableType<K,V> groupedTableType,
Pair<K,Iterable<V>> value)
Created a detached value for a
PGroupedTable value. |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.process(Pair<K,V> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given
PCollection<Pair<K, V>> to a PTable<K, V> . |
void |
Aggregate.TopKFn.cleanup(Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.cleanup(Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKCombineFn.process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.process(Pair<K,V> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
void |
Aggregate.TopKFn.process(Pair<K,V> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter) |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pair s using the specified column
ordering. |
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,Pair<U,V>> |
MapsideJoin.join(PTable<K,U> left,
PTable<K,V> right)
Join two tables using a map side join.
|
Modifier and Type | Method and Description |
---|---|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
Modifier and Type | Method and Description |
---|---|
void |
LeftOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
LeftOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
FullOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
FullOuterJoinFn.cleanup(Emitter<Pair<K,Pair<U,V>>> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
RightOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
RightOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
RightOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
LeftOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
LeftOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
LeftOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
abstract void |
JoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
abstract void |
JoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
abstract void |
JoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
InnerJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
InnerJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
InnerJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
FullOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
FullOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
FullOuterJoinFn.join(K key,
int id,
Iterable<Pair<U,V>> pairs,
Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
void |
JoinFn.process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.
|
Modifier and Type | Field and Description |
---|---|
static TupleFactory<Pair> |
TupleFactory.PAIR |
Modifier and Type | Method and Description |
---|---|
Pair<K,Iterable<V>> |
PGroupedTableType.PairIterableMapFn.map(Pair<Object,Iterable<Object>> input) |
Modifier and Type | Method and Description |
---|---|
SourceTarget<Pair<K,Iterable<V>>> |
PGroupedTableType.getDefaultFileSource(org.apache.hadoop.fs.Path path) |
<V1,V2> PType<Pair<V1,V2>> |
PTypeFamily.pairs(PType<V1> p1,
PType<V2> p2) |
Modifier and Type | Method and Description |
---|---|
Pair<K,Iterable<V>> |
PGroupedTableType.PairIterableMapFn.map(Pair<Object,Iterable<Object>> input) |
Modifier and Type | Method and Description |
---|---|
static <V1,V2> AvroType<Pair<V1,V2>> |
Avros.pairs(PType<V1> p1,
PType<V2> p2) |
<V1,V2> PType<Pair<V1,V2>> |
AvroTypeFamily.pairs(PType<V1> p1,
PType<V2> p2) |
Modifier and Type | Method and Description |
---|---|
static <V1,V2> WritableType<Pair<V1,V2>,TupleWritable> |
Writables.pairs(PType<V1> p1,
PType<V2> p2) |
<V1,V2> PType<Pair<V1,V2>> |
WritableTypeFamily.pairs(PType<V1> p1,
PType<V2> p2) |
Modifier and Type | Method and Description |
---|---|
Iterator<Pair<S,T>> |
Tuples.PairIterable.iterator() |
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.