Package | Description |
---|---|
org.apache.crunch |
Client-facing API and core abstractions.
|
org.apache.crunch.contrib.text | |
org.apache.crunch.examples |
Example applications demonstrating various aspects of Crunch.
|
org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
org.apache.crunch.impl.mr |
A Pipeline implementation that runs on Hadoop MapReduce.
|
org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
org.apache.crunch.lib.join |
Inner and outer joins on collections.
|
org.apache.crunch.util |
An assorted set of utilities.
|
Modifier and Type | Method and Description |
---|---|
PTable<K,V> |
PTable.bottom(int count)
Returns a PTable made up of the pairs in this PTable with the smallest
value field.
|
<K> PTable<K,S> |
PCollection.by(MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
<K> PTable<K,S> |
PCollection.by(String name,
MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTable.cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.
|
PTable<K,Collection<V>> |
PTable.collectValues()
Aggregate all of the values with the same key into a single key-value pair
in the returned PTable.
|
PTable<K,V> |
PGroupedTable.combineValues(Aggregator<V> aggregator)
Combine the values in each group using the given
Aggregator . |
PTable<K,V> |
PGroupedTable.combineValues(CombineFn<K,V> combineFn)
Combines the values of this grouping using the given
CombineFn . |
PTable<S,Long> |
PCollection.count()
Returns a
PTable instance that contains the counts of each unique
element of this PCollection. |
PTable<K,V> |
PTable.filter(FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
PTable<K,V> |
PTable.filter(String name,
FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
<U> PTable<K,Pair<V,U>> |
PTable.join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on
their common keys.
|
<K,V> PTable<K,V> |
PCollection.parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
Pipeline.read(TableSource<K,V> tableSource)
A version of the read method for
TableSource instances that map to
PTable s. |
PTable<K,V> |
PTable.top(int count)
Returns a PTable made up of the pairs in this PTable with the largest value
field.
|
PTable<K,V> |
PGroupedTable.ungroup()
Convert this grouping back into a multimap.
|
PTable<K,V> |
PTable.union(PTable<K,V>... others)
Returns a
PTable instance that acts as the union of this
PTable and the input PTable s. |
PTable<K,V> |
PTable.write(Target target)
Writes this
PTable to the given Target . |
PTable<K,V> |
PTable.write(Target target,
Target.WriteMode writeMode)
Writes this
PTable to the given Target , using the
given Target.WriteMode to handle existing targets. |
Modifier and Type | Method and Description |
---|---|
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTable.cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.
|
<U> PTable<K,Pair<V,U>> |
PTable.join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on
their common keys.
|
PTable<K,V> |
PTable.union(PTable<K,V>... others)
Returns a
PTable instance that acts as the union of this
PTable and the input PTable s. |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> . |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily . |
Modifier and Type | Method and Description |
---|---|
PTable<String,String> |
WordAggregationHBase.extractText(PTable<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.hbase.client.Result> words)
Extract information from hbase
|
Modifier and Type | Method and Description |
---|---|
PCollection<org.apache.hadoop.hbase.client.Put> |
WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase.
|
PTable<String,String> |
WordAggregationHBase.extractText(PTable<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.hbase.client.Result> words)
Extract information from hbase
|
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
MemPipeline.read(TableSource<K,V> source) |
static <S,T> PTable<S,T> |
MemPipeline.tableOf(Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
MemPipeline.tableOf(S s,
T t,
Object... more) |
static <S,T> PTable<S,T> |
MemPipeline.typedTableOf(PTableType<S,T> ptype,
Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
MemPipeline.typedTableOf(PTableType<S,T> ptype,
S s,
T t,
Object... more) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
MRPipeline.read(TableSource<K,V> source) |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given
PCollection<Pair<K, V>> to a PTable<K, V> . |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <K,V> PTable<K,Collection<V>> |
Aggregate.collectValues(PTable<K,V> collect) |
static <S> PTable<S,Long> |
Aggregate.count(PCollection<S> collect)
Returns a
PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input)
A
PTable<K, V> analogue of the distinct function. |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input,
int flushEvery)
A
PTable<K, V> analogue of the distinct function. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn) |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
double probability)
A
PTable<K, V> analogue of the sample function. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
long seed,
double probability)
A
PTable<K, V> analogue of the sample function. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table)
Sorts the
PTable using the natural ordering of its keys. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table,
Sort.Order key)
Sorts the
PTable using the natural ordering of its keys in the
order specified. |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V> PTable<K,V> |
Aggregate.top(PTable<K,V> ptable,
int limit,
boolean maximize) |
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <K,V> PTable<K,Collection<V>> |
Aggregate.collectValues(PTable<K,V> collect) |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input)
A
PTable<K, V> analogue of the distinct function. |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input,
int flushEvery)
A
PTable<K, V> analogue of the distinct function. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn) |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn) |
static <K,V> PCollection<K> |
PTables.keys(PTable<K,V> ptable)
Extract the keys from the given
PTable<K, V> as a PCollection<K> . |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
double probability)
A
PTable<K, V> analogue of the sample function. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
long seed,
double probability)
A
PTable<K, V> analogue of the sample function. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table)
Sorts the
PTable using the natural ordering of its keys. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table,
Sort.Order key)
Sorts the
PTable using the natural ordering of its keys in the
order specified. |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V> PTable<K,V> |
Aggregate.top(PTable<K,V> ptable,
int limit,
boolean maximize) |
static <K,V> PCollection<V> |
PTables.values(PTable<K,V> ptable)
Extract the values from the given
PTable<K, V> as a PCollection<V> . |
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,Pair<U,V>> |
MapsideJoin.join(PTable<K,U> left,
PTable<K,V> right)
Join two tables using a map side join.
|
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,Pair<U,V>> |
MapsideJoin.join(PTable<K,U> left,
PTable<K,V> right)
Join two tables using a map side join.
|
static <K,U,V> PTable<K,Pair<U,V>> |
MapsideJoin.join(PTable<K,U> left,
PTable<K,V> right)
Join two tables using a map side join.
|
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
CrunchTool.read(TableSource<K,V> tableSource) |
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.