|
|||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use PCollection | |
---|---|
org.apache.crunch | Client-facing API and core abstractions. |
org.apache.crunch.contrib.bloomfilter | Support for creating Bloom Filters. |
org.apache.crunch.contrib.text | |
org.apache.crunch.examples | Example applications demonstrating various aspects of Crunch. |
org.apache.crunch.impl.mem | In-memory Pipeline implementation for rapid prototyping and testing. |
org.apache.crunch.impl.mr | A Pipeline implementation that runs on Hadoop MapReduce. |
org.apache.crunch.lib | Joining, sorting, aggregating, and other commonly used functionality. |
org.apache.crunch.lib.join | Inner and outer joins on collections. |
org.apache.crunch.util | An assorted set of utilities. |
Uses of PCollection in org.apache.crunch |
---|
Subinterfaces of PCollection in org.apache.crunch | |
---|---|
interface |
PGroupedTable<K,V>
The Crunch representation of a grouped PTable , which corresponds to the output of
the shuffle phase of a MapReduce job. |
interface |
PTable<K,V>
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
Methods in org.apache.crunch that return PCollection | ||
---|---|---|
PCollection<S> |
PCollection.filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection . |
|
PCollection<S> |
PCollection.filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection . |
|
PCollection<K> |
PTable.keys()
Returns a PCollection made up of the keys in this PTable. |
|
|
PCollection.parallelDo(DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
Pipeline.read(Source<T> source)
Converts the given Source into a PCollection that is
available to jobs run using this Pipeline instance. |
|
PCollection<String> |
Pipeline.readTextFile(String pathName)
A convenience method for reading a text file. |
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a PCollection instance that acts as the union of this
PCollection and the input PCollection s. |
|
PCollection<S> |
PCollection.union(PCollection<S> other)
Returns a PCollection instance that acts as the union of this
PCollection and the given PCollection . |
|
PCollection<V> |
PTable.values()
Returns a PCollection made up of the values in this PTable. |
|
PCollection<S> |
PCollection.write(Target target)
Write the contents of this PCollection to the given Target ,
using the storage format specified by the target. |
|
PCollection<S> |
PCollection.write(Target target,
Target.WriteMode writeMode)
Write the contents of this PCollection to the given Target ,
using the given Target.WriteMode to handle existing
targets. |
Methods in org.apache.crunch with parameters of type PCollection | ||
---|---|---|
|
Pipeline.materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a PCollection instance that acts as the union of this
PCollection and the input PCollection s. |
|
PCollection<S> |
PCollection.union(PCollection<S> other)
Returns a PCollection instance that acts as the union of this
PCollection and the given PCollection . |
|
void |
Pipeline.write(PCollection<?> collection,
Target target)
Write the given collection to the given target on the next pipeline run. |
|
void |
Pipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
Write the contents of the PCollection to the given Target ,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
|
|
Pipeline.writeTextFile(PCollection<T> collection,
String pathName)
A convenience method for writing a text file. |
Uses of PCollection in org.apache.crunch.contrib.bloomfilter |
---|
Methods in org.apache.crunch.contrib.bloomfilter with parameters of type PCollection | ||
---|---|---|
static
|
BloomFilterFactory.createFilter(PCollection<T> collection,
BloomFilterFn<T> filterFn)
|
Uses of PCollection in org.apache.crunch.contrib.text |
---|
Methods in org.apache.crunch.contrib.text that return PCollection | ||
---|---|---|
static
|
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T> . |
|
static
|
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily . |
Methods in org.apache.crunch.contrib.text with parameters of type PCollection | ||
---|---|---|
static
|
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T> . |
|
static
|
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily . |
|
static
|
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> . |
|
static
|
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily . |
Uses of PCollection in org.apache.crunch.examples |
---|
Methods in org.apache.crunch.examples that return PCollection | |
---|---|
PCollection<org.apache.hadoop.hbase.client.Put> |
WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase. |
Uses of PCollection in org.apache.crunch.impl.mem |
---|
Methods in org.apache.crunch.impl.mem that return PCollection | ||
---|---|---|
static
|
MemPipeline.collectionOf(Iterable<T> collect)
|
|
static
|
MemPipeline.collectionOf(T... ts)
|
|
|
MemPipeline.read(Source<T> source)
|
|
PCollection<String> |
MemPipeline.readTextFile(String pathName)
|
|
static
|
MemPipeline.typedCollectionOf(PType<T> ptype,
Iterable<T> collect)
|
|
static
|
MemPipeline.typedCollectionOf(PType<T> ptype,
T... ts)
|
Methods in org.apache.crunch.impl.mem with parameters of type PCollection | ||
---|---|---|
|
MemPipeline.materialize(PCollection<T> pcollection)
|
|
void |
MemPipeline.write(PCollection<?> collection,
Target target)
|
|
void |
MemPipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
|
|
|
MemPipeline.writeTextFile(PCollection<T> collection,
String pathName)
|
Uses of PCollection in org.apache.crunch.impl.mr |
---|
Methods in org.apache.crunch.impl.mr that return PCollection | ||
---|---|---|
|
MRPipeline.read(Source<S> source)
|
|
PCollection<String> |
MRPipeline.readTextFile(String pathName)
|
Methods in org.apache.crunch.impl.mr with parameters of type PCollection | ||
---|---|---|
|
MRPipeline.getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection . |
|
|
MRPipeline.materialize(PCollection<T> pcollection)
|
|
void |
MRPipeline.write(PCollection<?> pcollection,
Target target)
|
|
void |
MRPipeline.write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode)
|
|
|
MRPipeline.writeTextFile(PCollection<T> pcollection,
String pathName)
|
Uses of PCollection in org.apache.crunch.lib |
---|
Methods in org.apache.crunch.lib that return PCollection | ||
---|---|---|
static
|
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified PCollection s (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified PCollection s (using the
same strategy as Pig's CROSS operator). |
|
static
|
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements. |
|
static
|
Distinct.distinct(PCollection<S> input)
Construct a new PCollection that contains the unique elements of a
given input PCollection . |
|
static
|
Distinct.distinct(PCollection<S> input,
int flushEvery)
A distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
|
static
|
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes)
The most general purpose of the weighted reservoir sampling patterns that allows us to choose a random sample of elements for each of N input groups. |
|
static
|
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes,
Long seed)
Same as the other groupedWeightedReservoirSample method, but include a seed for testing purposes. |
|
static
|
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements. |
|
static
|
PTables.keys(PTable<K,V> ptable)
Extract the keys from the given PTable<K, V> as a PCollection<K> . |
|
static
|
Sample.reservoirSample(PCollection<T> input,
int sampleSize)
Select a fixed number of elements from the given PCollection with each element
equally likely to be included in the sample. |
|
static
|
Sample.reservorSample(PCollection<T> input,
int sampleSize,
Long seed)
A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes. |
|
static
|
Sample.sample(PCollection<S> input,
double probability)
Output records from the given PCollection with the given probability. |
|
static
|
Sample.sample(PCollection<S> input,
Long seed,
double probability)
Output records from the given PCollection using a given seed. |
|
static
|
Shard.shard(PCollection<T> pc,
int numPartitions)
Creates a PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. |
|
static
|
Sort.sort(PCollection<T> collection)
Sorts the PCollection using the natural ordering of its elements in ascending order. |
|
static
|
Sort.sort(PCollection<T> collection,
int numReducers,
Sort.Order order)
Sorts the PCollection using the natural ordering of its elements in
the order specified using the given number of reducers. |
|
static
|
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the PCollection using the natural order of its elements with the given Order . |
|
static
|
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
|
static
|
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype,
int numReducers)
Perform a secondary sort on the given PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> , using
the given number of reducers. |
|
static
|
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Pair s using the specified column
ordering. |
|
static
|
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple4 s using the specified column
ordering. |
|
static
|
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple3 s using the specified column
ordering. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
int numReducers,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of TupleN s using the specified column
ordering and a client-specified number of reducers. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of tuples using the specified column ordering. |
|
static
|
PTables.values(PTable<K,V> ptable)
Extract the values from the given PTable<K, V> as a PCollection<V> . |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize)
Selects a weighted sample of the elements of the given PCollection , where the second term in
the input Pair is a numerical weight. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize,
Long seed)
The weighted reservoir sampling function with the seed term exposed for testing purposes. |
Methods in org.apache.crunch.lib that return types with arguments of type PCollection | ||
---|---|---|
static
|
Channels.split(PCollection<Pair<T,U>> pCollection)
Splits a PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
|
static
|
Channels.split(PCollection<Pair<T,U>> pCollection)
Splits a PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
|
static
|
Channels.split(PCollection<Pair<T,U>> pCollection,
PType<T> firstPType,
PType<U> secondPType)
Splits a PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
|
static
|
Channels.split(PCollection<Pair<T,U>> pCollection,
PType<T> firstPType,
PType<U> secondPType)
Splits a PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
Methods in org.apache.crunch.lib with parameters of type PCollection | ||
---|---|---|
static
|
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given PCollection<Pair<K, V>> to a PTable<K, V> . |
|
static
|
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. |
|
static
|
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. |
|
static
|
Aggregate.count(PCollection<S> collect)
Returns a PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
|
static
|
Aggregate.count(PCollection<S> collect,
int numPartitions)
Returns a PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified PCollection s (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified PCollection s (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified PCollection s (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified PCollection s (using the
same strategy as Pig's CROSS operator). |
|
static
|
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements. |
|
static
|
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements. |
|
static
|
Distinct.distinct(PCollection<S> input)
Construct a new PCollection that contains the unique elements of a
given input PCollection . |
|
static
|
Distinct.distinct(PCollection<S> input,
int flushEvery)
A distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
|
static
|
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements. |
|
static
|
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements. |
|
static
|
Aggregate.length(PCollection<S> collect)
Returns the number of elements in the provided PCollection. |
|
static
|
Aggregate.max(PCollection<S> collect)
Returns the largest numerical element from the input collection. |
|
static
|
Aggregate.min(PCollection<S> collect)
Returns the smallest numerical element from the input collection. |
|
static
|
Sample.reservoirSample(PCollection<T> input,
int sampleSize)
Select a fixed number of elements from the given PCollection with each element
equally likely to be included in the sample. |
|
static
|
Sample.reservorSample(PCollection<T> input,
int sampleSize,
Long seed)
A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes. |
|
static
|
Sample.sample(PCollection<S> input,
double probability)
Output records from the given PCollection with the given probability. |
|
static
|
Sample.sample(PCollection<S> input,
Long seed,
double probability)
Output records from the given PCollection using a given seed. |
|
static
|
Shard.shard(PCollection<T> pc,
int numPartitions)
Creates a PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. |
|
static
|
Sort.sort(PCollection<T> collection)
Sorts the PCollection using the natural ordering of its elements in ascending order. |
|
static
|
Sort.sort(PCollection<T> collection,
int numReducers,
Sort.Order order)
Sorts the PCollection using the natural ordering of its elements in
the order specified using the given number of reducers. |
|
static
|
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the PCollection using the natural order of its elements with the given Order . |
|
static
|
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Pair s using the specified column
ordering. |
|
static
|
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple4 s using the specified column
ordering. |
|
static
|
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple3 s using the specified column
ordering. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
int numReducers,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of TupleN s using the specified column
ordering and a client-specified number of reducers. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of tuples using the specified column ordering. |
|
static
|
Channels.split(PCollection<Pair<T,U>> pCollection)
Splits a PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
|
static
|
Channels.split(PCollection<Pair<T,U>> pCollection,
PType<T> firstPType,
PType<U> secondPType)
Splits a PCollection of any Pair of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize)
Selects a weighted sample of the elements of the given PCollection , where the second term in
the input Pair is a numerical weight. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize,
Long seed)
The weighted reservoir sampling function with the seed term exposed for testing purposes. |
Uses of PCollection in org.apache.crunch.lib.join |
---|
Methods in org.apache.crunch.lib.join that return PCollection | ||
---|---|---|
static
|
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype)
Performs a join on two tables, where the left table only contains a single value per key. |
|
static
|
OneToManyJoin.oneToManyJoin(PTable<K,U> left,
PTable<K,V> right,
DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
PType<T> ptype,
int numReducers)
Supports a user-specified number of reducers for the one-to-many join. |
Uses of PCollection in org.apache.crunch.util |
---|
Methods in org.apache.crunch.util that return PCollection | ||
---|---|---|
|
CrunchTool.read(Source<T> source)
|
|
PCollection<String> |
CrunchTool.readTextFile(String pathName)
|
Methods in org.apache.crunch.util with parameters of type PCollection | ||
---|---|---|
static
|
PartitionUtils.getRecommendedPartitions(PCollection<T> pcollection)
|
|
static
|
PartitionUtils.getRecommendedPartitions(PCollection<T> pcollection,
org.apache.hadoop.conf.Configuration conf)
|
|
|
CrunchTool.materialize(PCollection<T> pcollection)
|
|
void |
CrunchTool.write(PCollection<?> pcollection,
Target target)
|
|
void |
CrunchTool.writeTextFile(PCollection<?> pcollection,
String pathName)
|
|
|||||||||
PREV NEXT | FRAMES NO FRAMES |