Package | Description |
---|---|
org.apache.crunch |
Client-facing API and core abstractions.
|
org.apache.crunch.contrib.text | |
org.apache.crunch.examples |
Example applications demonstrating various aspects of Crunch.
|
org.apache.crunch.impl.dist | |
org.apache.crunch.impl.dist.collect | |
org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
org.apache.crunch.impl.spark | |
org.apache.crunch.impl.spark.collect | |
org.apache.crunch.lambda |
Alternative Crunch API using Java 8 features to allow construction of pipelines using lambda functions and method
references.
|
org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
org.apache.crunch.lib.join |
Inner and outer joins on collections.
|
org.apache.crunch.util |
An assorted set of utilities.
|
Modifier and Type | Method and Description |
---|---|
PTable<K,V> |
PTable.bottom(int count)
Returns a PTable made up of the pairs in this PTable with the smallest
value field.
|
<K> PTable<K,S> |
PCollection.by(MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
<K> PTable<K,S> |
PCollection.by(String name,
MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
PTable<K,V> |
PTable.cache() |
PTable<K,V> |
PTable.cache(CachingOptions options) |
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTable.cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.
|
PTable<K,Collection<V>> |
PTable.collectValues()
Aggregate all of the values with the same key into a single key-value pair
in the returned PTable.
|
PTable<K,V> |
PGroupedTable.combineValues(Aggregator<V> aggregator)
Combine the values in each group using the given
Aggregator . |
PTable<K,V> |
PGroupedTable.combineValues(Aggregator<V> combineAggregator,
Aggregator<V> reduceAggregator)
Combine and reduces the values in each group using the given
Aggregator instances. |
PTable<K,V> |
PGroupedTable.combineValues(CombineFn<K,V> combineFn)
Combines the values of this grouping using the given
CombineFn . |
PTable<K,V> |
PGroupedTable.combineValues(CombineFn<K,V> combineFn,
CombineFn<K,V> reduceFn)
Combines and reduces the values of this grouping using the given
CombineFn instances. |
PTable<S,Long> |
PCollection.count()
Returns a
PTable instance that contains the counts of each unique
element of this PCollection. |
<K,V> PTable<K,V> |
Pipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype)
Creates a
PTable containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
<K,V> PTable<K,V> |
Pipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options)
Creates a
PTable containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
<K,V> PTable<K,V> |
Pipeline.emptyPTable(PTableType<K,V> ptype)
Creates an empty
PTable of the given PTable Type . |
PTable<K,V> |
PTable.filter(FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
PTable<K,V> |
PTable.filter(String name,
FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting
PTable . |
<U> PTable<K,Pair<V,U>> |
PTable.join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on
their common keys.
|
<K2> PTable<K2,V> |
PTable.mapKeys(MapFn<K,K2> mapFn,
PType<K2> ptype)
Returns a
PTable that has the same values as this instance, but
uses the given function to map the keys. |
<K2> PTable<K2,V> |
PTable.mapKeys(String name,
MapFn<K,K2> mapFn,
PType<K2> ptype)
Returns a
PTable that has the same values as this instance, but
uses the given function to map the keys. |
<U> PTable<K,U> |
PGroupedTable.mapValues(MapFn<Iterable<V>,U> mapFn,
PType<U> ptype)
Maps the
Iterable<V> elements of each record to a new type. |
<U> PTable<K,U> |
PTable.mapValues(MapFn<V,U> mapFn,
PType<U> ptype)
Returns a
PTable that has the same keys as this instance, but
uses the given function to map the values. |
<U> PTable<K,U> |
PGroupedTable.mapValues(String name,
MapFn<Iterable<V>,U> mapFn,
PType<U> ptype)
Maps the
Iterable<V> elements of each record to a new type. |
<U> PTable<K,U> |
PTable.mapValues(String name,
MapFn<V,U> mapFn,
PType<U> ptype)
Returns a
PTable that has the same keys as this instance, but
uses the given function to map the values. |
<K,V> PTable<K,V> |
PCollection.parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
PCollection.parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
Pipeline.read(TableSource<K,V> tableSource)
A version of the read method for
TableSource instances that map to
PTable s. |
<K,V> PTable<K,V> |
Pipeline.read(TableSource<K,V> tableSource,
String named)
A version of the read method for
TableSource instances that map to
PTable s. |
PTable<K,V> |
PTable.top(int count)
Returns a PTable made up of the pairs in this PTable with the largest value
field.
|
PTable<K,V> |
PGroupedTable.ungroup()
Convert this grouping back into a multimap.
|
PTable<K,V> |
PTable.union(PTable<K,V>... others)
Returns a
PTable instance that acts as the union of this
PTable and the input PTable s. |
PTable<K,V> |
PTable.union(PTable<K,V> other)
Returns a
PTable instance that acts as the union of this
PTable and the other PTable s. |
<K,V> PTable<K,V> |
Pipeline.unionTables(List<PTable<K,V>> tables) |
PTable<K,V> |
PTable.write(Target target)
Writes this
PTable to the given Target . |
PTable<K,V> |
PTable.write(Target target,
Target.WriteMode writeMode)
Writes this
PTable to the given Target , using the
given Target.WriteMode to handle existing targets. |
Modifier and Type | Method and Description |
---|---|
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTable.cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.
|
<U> PTable<K,Pair<V,U>> |
PTable.join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on
their common keys.
|
PTable<K,V> |
PTable.union(PTable<K,V>... others)
Returns a
PTable instance that acts as the union of this
PTable and the input PTable s. |
PTable<K,V> |
PTable.union(PTable<K,V> other)
Returns a
PTable instance that acts as the union of this
PTable and the other PTable s. |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
Pipeline.unionTables(List<PTable<K,V>> tables) |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> . |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily . |
Modifier and Type | Method and Description |
---|---|
PTable<String,String> |
WordAggregationHBase.extractText(PTable<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.hbase.client.Result> words)
Extract information from hbase
|
Modifier and Type | Method and Description |
---|---|
PCollection<org.apache.hadoop.hbase.client.Put> |
WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase.
|
PTable<String,String> |
WordAggregationHBase.extractText(PTable<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.hbase.client.Result> words)
Extract information from hbase
|
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
DistributedPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype) |
<K,V> PTable<K,V> |
DistributedPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
<K,V> PTable<K,V> |
DistributedPipeline.emptyPTable(PTableType<K,V> ptype) |
<K,V> PTable<K,V> |
DistributedPipeline.read(TableSource<K,V> source) |
<K,V> PTable<K,V> |
DistributedPipeline.read(TableSource<K,V> source,
String named) |
<K,V> PTable<K,V> |
DistributedPipeline.unionTables(List<PTable<K,V>> tables) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
DistributedPipeline.unionTables(List<PTable<K,V>> tables) |
Modifier and Type | Class and Description |
---|---|
class |
BaseDoTable<K,V> |
class |
BaseInputTable<K,V> |
class |
BaseUnionTable<K,V> |
class |
EmptyPTable<K,V> |
class |
PTableBase<K,V> |
Modifier and Type | Method and Description |
---|---|
PTable<K,V> |
PTableBase.bottom(int count) |
<K> PTable<K,S> |
PCollectionImpl.by(MapFn<S,K> mapFn,
PType<K> keyType) |
<K> PTable<K,S> |
PCollectionImpl.by(String name,
MapFn<S,K> mapFn,
PType<K> keyType) |
PTable<K,V> |
PTableBase.cache() |
PTable<K,V> |
PTableBase.cache(CachingOptions options) |
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTableBase.cogroup(PTable<K,U> other) |
PTable<K,Collection<V>> |
PTableBase.collectValues() |
PTable<K,V> |
BaseGroupedTable.combineValues(Aggregator<V> agg) |
PTable<K,V> |
BaseGroupedTable.combineValues(Aggregator<V> combineAgg,
Aggregator<V> reduceAgg) |
PTable<K,V> |
BaseGroupedTable.combineValues(CombineFn<K,V> combineFn) |
PTable<K,V> |
BaseGroupedTable.combineValues(CombineFn<K,V> combineFn,
CombineFn<K,V> reduceFn) |
PTable<S,Long> |
PCollectionImpl.count() |
<K,V> PTable<K,V> |
PCollectionFactory.createUnionTable(List<PTableBase<K,V>> internal) |
PTable<K,V> |
PTableBase.filter(FilterFn<Pair<K,V>> filterFn) |
PTable<K,V> |
PTableBase.filter(String name,
FilterFn<Pair<K,V>> filterFn) |
<U> PTable<K,Pair<V,U>> |
PTableBase.join(PTable<K,U> other) |
<K2> PTable<K2,V> |
PTableBase.mapKeys(MapFn<K,K2> mapFn,
PType<K2> ptype) |
<K2> PTable<K2,V> |
PTableBase.mapKeys(String name,
MapFn<K,K2> mapFn,
PType<K2> ptype) |
<U> PTable<K,U> |
BaseGroupedTable.mapValues(MapFn<Iterable<V>,U> mapFn,
PType<U> ptype) |
<U> PTable<K,U> |
PTableBase.mapValues(MapFn<V,U> mapFn,
PType<U> ptype) |
<U> PTable<K,U> |
BaseGroupedTable.mapValues(String name,
MapFn<Iterable<V>,U> mapFn,
PType<U> ptype) |
<U> PTable<K,U> |
PTableBase.mapValues(String name,
MapFn<V,U> mapFn,
PType<U> ptype) |
<K,V> PTable<K,V> |
PCollectionImpl.parallelDo(DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type) |
<K,V> PTable<K,V> |
PCollectionImpl.parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type) |
<K,V> PTable<K,V> |
PCollectionImpl.parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type,
ParallelDoOptions options) |
PTable<K,V> |
PTableBase.top(int count) |
PTable<K,V> |
BaseGroupedTable.ungroup() |
PTable<K,V> |
PTableBase.union(PTable<K,V>... others) |
PTable<K,V> |
PTableBase.union(PTable<K,V> other) |
PTable<K,V> |
PTableBase.write(Target target) |
PTable<K,V> |
PTableBase.write(Target target,
Target.WriteMode writeMode) |
Modifier and Type | Method and Description |
---|---|
<U> PTable<K,Pair<Collection<V>,Collection<U>>> |
PTableBase.cogroup(PTable<K,U> other) |
<U> PTable<K,Pair<V,U>> |
PTableBase.join(PTable<K,U> other) |
PTable<K,V> |
PTableBase.union(PTable<K,V>... others) |
PTable<K,V> |
PTableBase.union(PTable<K,V> other) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
MemPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype) |
<K,V> PTable<K,V> |
MemPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
<K,V> PTable<K,V> |
MemPipeline.emptyPTable(PTableType<K,V> ptype) |
<K,V> PTable<K,V> |
MemPipeline.read(TableSource<K,V> source) |
<K,V> PTable<K,V> |
MemPipeline.read(TableSource<K,V> source,
String named) |
static <S,T> PTable<S,T> |
MemPipeline.tableOf(Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
MemPipeline.tableOf(S s,
T t,
Object... more) |
static <S,T> PTable<S,T> |
MemPipeline.typedTableOf(PTableType<S,T> ptype,
Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
MemPipeline.typedTableOf(PTableType<S,T> ptype,
S s,
T t,
Object... more) |
<K,V> PTable<K,V> |
MemPipeline.unionTables(List<PTable<K,V>> tables) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
MemPipeline.unionTables(List<PTable<K,V>> tables) |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
SparkPipeline.create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options) |
<K,V> PTable<K,V> |
SparkPipeline.emptyPTable(PTableType<K,V> ptype) |
Modifier and Type | Class and Description |
---|---|
class |
CreatedTable<K,V>
Represents a Spark-based PTable that was created from a Java
Iterable of
key-value pairs. |
class |
DoTable<K,V> |
class |
InputTable<K,V> |
class |
UnionTable<K,V> |
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
SparkCollectFactory.createUnionTable(List<PTableBase<K,V>> internal) |
Modifier and Type | Method and Description |
---|---|
PTable<K,V> |
LTable.underlying()
Get the underlying
PTable for this LCollection |
Modifier and Type | Method and Description |
---|---|
<K,V> LTable<K,V> |
LCollectionFactory.wrap(PTable<K,V> collection)
Wrap a PTable into an LTable
|
static <K,V> LTable<K,V> |
Lambda.wrap(PTable<K,V> collection) |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given
PCollection<Pair<K, V>> to a PTable<K, V> . |
static <K,U,V> PTable<K,TupleN> |
Cogroup.cogroup(int numReducers,
PTable<K,?> first,
PTable<K,?>... rest)
Co-groups an arbitrary number of
PTable arguments with a user-specified degree of parallelism
(a.k.a, number of reducers.) The largest table should come last in the ordering. |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(int numReducers,
PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K> PTable<K,TupleN> |
Cogroup.cogroup(PTable<K,?> first,
PTable<K,?>... rest)
Co-groups an arbitrary number of
PTable arguments. |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <K,V1,V2,V3> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments. |
static <K,V> PTable<K,Collection<V>> |
Aggregate.collectValues(PTable<K,V> collect) |
static <S> PTable<S,Long> |
Aggregate.count(PCollection<S> collect)
Returns a
PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
static <S> PTable<S,Long> |
Aggregate.count(PCollection<S> collect,
int numPartitions)
Returns a
PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input)
A
PTable<K, V> analogue of the distinct function. |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input,
int flushEvery)
A
PTable<K, V> analogue of the distinct function. |
static <K,V extends Number> |
Quantiles.distributed(PTable<K,V> table,
double p1,
double... pn)
Calculate a set of quantiles for each key in a numerically-valued table.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <X> PTable<X,Long> |
TopList.globalToplist(PCollection<X> input)
Create a list of unique items in the input collection with their count, sorted descending by their frequency.
|
static <K,V extends Comparable> |
Quantiles.inMemory(PTable<K,V> table,
double p1,
double... pn)
Calculate a set of quantiles for each key in a numerically-valued table.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K1,V1,K2 extends org.apache.hadoop.io.Writable,V2 extends org.apache.hadoop.io.Writable> |
Mapreduce.map(PTable<K1,V1> input,
Class<? extends org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>> mapperClass,
Class<K2> keyClass,
Class<V2> valueClass) |
static <K1,V1,K2 extends org.apache.hadoop.io.Writable,V2 extends org.apache.hadoop.io.Writable> |
Mapred.map(PTable<K1,V1> input,
Class<? extends org.apache.hadoop.mapred.Mapper<K1,V1,K2,V2>> mapperClass,
Class<K2> keyClass,
Class<V2> valueClass) |
static <K1,K2,V> PTable<K2,V> |
PTables.mapKeys(PTable<K1,V> ptable,
MapFn<K1,K2> mapFn,
PType<K2> ptype)
Maps a
PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on
the keys of the PTable . |
static <K1,K2,V> PTable<K2,V> |
PTables.mapKeys(String name,
PTable<K1,V> ptable,
MapFn<K1,K2> mapFn,
PType<K2> ptype)
Maps a
PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on
the keys of the PTable . |
static <K,U,V> PTable<K,V> |
PTables.mapValues(PGroupedTable<K,U> ptable,
MapFn<Iterable<U>,V> mapFn,
PType<V> ptype)
An analogue of the
mapValues function for PGroupedTable<K, U> collections. |
static <K,U,V> PTable<K,V> |
PTables.mapValues(PTable<K,U> ptable,
MapFn<U,V> mapFn,
PType<V> ptype)
Maps a
PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on
the values of the PTable . |
static <K,U,V> PTable<K,V> |
PTables.mapValues(String name,
PGroupedTable<K,U> ptable,
MapFn<Iterable<U>,V> mapFn,
PType<V> ptype)
An analogue of the
mapValues function for PGroupedTable<K, U> collections. |
static <K,U,V> PTable<K,V> |
PTables.mapValues(String name,
PTable<K,U> ptable,
MapFn<U,V> mapFn,
PType<V> ptype)
Maps a
PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on
the values of the PTable . |
static <K,V extends Number> |
Average.meanValue(PTable<K,V> table)
Calculate the mean average value by key for a table with numeric values.
|
static <K> PTable<K,Long> |
TopList.negateCounts(PTable<K,Long> table)
When creating toplists, it is often required to sort by count descending.
|
static <K1,V1,K2 extends org.apache.hadoop.io.Writable,V2 extends org.apache.hadoop.io.Writable> |
Mapreduce.reduce(PGroupedTable<K1,V1> input,
Class<? extends org.apache.hadoop.mapreduce.Reducer<K1,V1,K2,V2>> reducerClass,
Class<K2> keyClass,
Class<V2> valueClass) |
static <K1,V1,K2 extends org.apache.hadoop.io.Writable,V2 extends org.apache.hadoop.io.Writable> |
Mapred.reduce(PGroupedTable<K1,V1> input,
Class<? extends org.apache.hadoop.mapred.Reducer<K1,V1,K2,V2>> reducerClass,
Class<K2> keyClass,
Class<V2> valueClass) |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
double probability)
A
PTable<K, V> analogue of the sample function. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
Long seed,
double probability)
A
PTable<K, V> analogue of the sample function, with the seed argument
exposed for testing purposes. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table)
Sorts the
PTable using the natural ordering of its keys in ascending order. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table,
int numReducers,
Sort.Order key)
Sorts the
PTable using the natural ordering of its keys in the
order specified with a client-specified number of reducers. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table,
Sort.Order key)
Sorts the
PTable using the natural ordering of its keys with the given Order . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> , using
the given number of reducers. |
static <K,V> PTable<V,K> |
PTables.swapKeyValue(PTable<K,V> table)
Swap the key and value part of a table.
|
static <K,V> PTable<K,V> |
Aggregate.top(PTable<K,V> ptable,
int limit,
boolean maximize)
Selects the top N pairs from the given table, with sorting being performed on the values (i.e.
|
static <X,Y> PTable<X,Collection<Pair<Long,Y>>> |
TopList.topNYbyX(PTable<X,Y> input,
int n)
Create a top-list of elements in the provided PTable, categorised by the key of the input table and using the count
of the value part of the input table.
|
Modifier and Type | Method and Description |
---|---|
static <K,U,V> PTable<K,TupleN> |
Cogroup.cogroup(int numReducers,
PTable<K,?> first,
PTable<K,?>... rest)
Co-groups an arbitrary number of
PTable arguments with a user-specified degree of parallelism
(a.k.a, number of reducers.) The largest table should come last in the ordering. |
static <K,U,V> PTable<K,TupleN> |
Cogroup.cogroup(int numReducers,
PTable<K,?> first,
PTable<K,?>... rest)
Co-groups an arbitrary number of
PTable arguments with a user-specified degree of parallelism
(a.k.a, number of reducers.) The largest table should come last in the ordering. |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(int numReducers,
PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(int numReducers,
PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(int numReducers,
PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.) |
static <K> PTable<K,TupleN> |
Cogroup.cogroup(PTable<K,?> first,
PTable<K,?>... rest)
Co-groups an arbitrary number of
PTable arguments. |
static <K> PTable<K,TupleN> |
Cogroup.cogroup(PTable<K,?> first,
PTable<K,?>... rest)
Co-groups an arbitrary number of
PTable arguments. |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <K,U,V> PTable<K,Pair<Collection<U>,Collection<V>>> |
Cogroup.cogroup(PTable<K,U> left,
PTable<K,V> right)
Co-groups the two
PTable arguments. |
static <K,V1,V2,V3> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments. |
static <K,V1,V2,V3,V4> |
Cogroup.cogroup(PTable<K,V1> first,
PTable<K,V2> second,
PTable<K,V3> third,
PTable<K,V4> fourth)
Co-groups the three
PTable arguments. |
static <K,V> PTable<K,Collection<V>> |
Aggregate.collectValues(PTable<K,V> collect) |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K1,K2,U,V> |
Cartesian.cross(PTable<K1,U> left,
PTable<K2,V> right,
int parallelism)
Performs a full cross join on the specified
PTable s (using the same
strategy as Pig's CROSS operator). |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input)
A
PTable<K, V> analogue of the distinct function. |
static <K,V> PTable<K,V> |
Distinct.distinct(PTable<K,V> input,
int flushEvery)
A
PTable<K, V> analogue of the distinct function. |
static <K,V extends Number> |
Quantiles.distributed(PTable<K,V> table,
double p1,
double... pn)
Calculate a set of quantiles for each key in a numerically-valued table.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.fullJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a full outer join on the specified
PTable s. |
static <T,N extends Number> |
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes)
The most general purpose of the weighted reservoir sampling patterns that allows us to choose
a random sample of elements for each of N input groups.
|
static <T,N extends Number> |
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes,
Long seed)
Same as the other groupedWeightedReservoirSample method, but include a seed for testing
purposes.
|
static <K,V extends Comparable> |
Quantiles.inMemory(PTable<K,V> table,
double p1,
double... pn)
Calculate a set of quantiles for each key in a numerically-valued table.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.innerJoin(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.join(PTable<K,U> left,
PTable<K,V> right)
Performs an inner join on the specified
PTable s. |
static <K,V> PCollection<K> |
PTables.keys(PTable<K,V> ptable)
Extract the keys from the given
PTable<K, V> as a PCollection<K> . |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.leftJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a left outer join on the specified
PTable s. |
static <K1,V1,K2 extends org.apache.hadoop.io.Writable,V2 extends org.apache.hadoop.io.Writable> |
Mapreduce.map(PTable<K1,V1> input,
Class<? extends org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>> mapperClass,
Class<K2> keyClass,
Class<V2> valueClass) |
static <K1,V1,K2 extends org.apache.hadoop.io.Writable,V2 extends org.apache.hadoop.io.Writable> |
Mapred.map(PTable<K1,V1> input,
Class<? extends org.apache.hadoop.mapred.Mapper<K1,V1,K2,V2>> mapperClass,
Class<K2> keyClass,
Class<V2> valueClass) |
static <K1,K2,V> PTable<K2,V> |
PTables.mapKeys(PTable<K1,V> ptable,
MapFn<K1,K2> mapFn,
PType<K2> ptype)
Maps a
PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on
the keys of the PTable . |
static <K1,K2,V> PTable<K2,V> |
PTables.mapKeys(String name,
PTable<K1,V> ptable,
MapFn<K1,K2> mapFn,
PType<K2> ptype)
Maps a
PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on
the keys of the PTable . |
static <K,U,V> PTable<K,V> |
PTables.mapValues(PTable<K,U> ptable,
MapFn<U,V> mapFn,
PType<V> ptype)
Maps a
PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on
the values of the PTable . |
static <K,U,V> PTable<K,V> |
PTables.mapValues(String name,
PTable<K,U> ptable,
MapFn<U,V> mapFn,
PType<V> ptype)
Maps a
PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on
the values of the PTable . |
static <K,V extends Number> |
Average.meanValue(PTable<K,V> table)
Calculate the mean average value by key for a table with numeric values.
|
static <K> PTable<K,Long> |
TopList.negateCounts(PTable<K,Long> table)
When creating toplists, it is often required to sort by count descending.
|
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <K,U,V> PTable<K,Pair<U,V>> |
Join.rightJoin(PTable<K,U> left,
PTable<K,V> right)
Performs a right outer join on the specified
PTable s. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
double probability)
A
PTable<K, V> analogue of the sample function. |
static <K,V> PTable<K,V> |
Sample.sample(PTable<K,V> input,
Long seed,
double probability)
A
PTable<K, V> analogue of the sample function, with the seed argument
exposed for testing purposes. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table)
Sorts the
PTable using the natural ordering of its keys in ascending order. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table,
int numReducers,
Sort.Order key)
Sorts the
PTable using the natural ordering of its keys in the
order specified with a client-specified number of reducers. |
static <K,V> PTable<K,V> |
Sort.sort(PTable<K,V> table,
Sort.Order key)
Sorts the
PTable using the natural ordering of its keys with the given Order . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> . |
static <K,V1,V2,U,V> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
PTableType<U,V> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V> , using
the given number of reducers. |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype,
int numReducers)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> , using
the given number of reducers. |
static <K,V> PTable<V,K> |
PTables.swapKeyValue(PTable<K,V> table)
Swap the key and value part of a table.
|
static <K,V> PTable<K,V> |
Aggregate.top(PTable<K,V> ptable,
int limit,
boolean maximize)
Selects the top N pairs from the given table, with sorting being performed on the values (i.e.
|
static <X,Y> PTable<X,Collection<Pair<Long,Y>>> |
TopList.topNYbyX(PTable<X,Y> input,
int n)
Create a top-list of elements in the provided PTable, categorised by the key of the input table and using the count
of the value part of the input table.
|
static <K,V> PCollection<V> |
PTables.values(PTable<K,V> ptable)
Extract the values from the given
PTable<K, V> as a PCollection<V> . |
Modifier and Type | Method and Description |
---|---|
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn)
Perform a default join on the given
PTable instances using a user-specified JoinFn . |
PTable<K,Pair<U,V>> |
ShardedJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
MapsideJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
JoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type.
|
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
BloomFilterJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
Modifier and Type | Method and Description |
---|---|
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn)
Perform a default join on the given
PTable instances using a user-specified JoinFn . |
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinFn<K,U,V> joinFn)
Perform a default join on the given
PTable instances using a user-specified JoinFn . |
PTable<K,Pair<U,V>> |
ShardedJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
ShardedJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
MapsideJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
MapsideJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
JoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type.
|
PTable<K,Pair<U,V>> |
JoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type.
|
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
DefaultJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
BloomFilterJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
PTable<K,Pair<U,V>> |
BloomFilterJoinStrategy.join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType) |
static <K,U,V,T> PCollection<T> |
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype)
Performs a join on two tables, where the left table only contains a single
value per key.
|
static <K,U,V,T> PCollection<T> |
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype)
Performs a join on two tables, where the left table only contains a single
value per key.
|
static <K,U,V,T> PCollection<T> |
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype,
int numReducers)
Supports a user-specified number of reducers for the one-to-many join.
|
static <K,U,V,T> PCollection<T> |
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype,
int numReducers)
Supports a user-specified number of reducers for the one-to-many join.
|
Modifier and Type | Method and Description |
---|---|
<K,V> PTable<K,V> |
CrunchTool.read(TableSource<K,V> tableSource) |
Copyright © 2016 The Apache Software Foundation. All rights reserved.