|
|||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||
| Packages that use PCollection | |
|---|---|
| org.apache.crunch | Client-facing API and core abstractions. |
| org.apache.crunch.contrib.bloomfilter | Support for creating Bloom Filters. |
| org.apache.crunch.contrib.text | |
| org.apache.crunch.examples | Example applications demonstrating various aspects of Crunch. |
| org.apache.crunch.impl.mem | In-memory Pipeline implementation for rapid prototyping and testing. |
| org.apache.crunch.impl.mr | A Pipeline implementation that runs on Hadoop MapReduce. |
| org.apache.crunch.lib | Joining, sorting, aggregating, and other commonly used functionality. |
| org.apache.crunch.util | An assorted set of utilities. |
| Uses of PCollection in org.apache.crunch |
|---|
| Subinterfaces of PCollection in org.apache.crunch | |
|---|---|
interface |
PGroupedTable<K,V>
The Crunch representation of a grouped PTable. |
interface |
PTable<K,V>
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
| Methods in org.apache.crunch that return PCollection | ||
|---|---|---|
PCollection<S> |
PCollection.filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection. |
|
PCollection<S> |
PCollection.filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection. |
|
PCollection<K> |
PTable.keys()
Returns a PCollection made up of the keys in this PTable. |
|
|
PCollection.parallelDo(DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
Pipeline.read(Source<T> source)
Converts the given Source into a PCollection that is
available to jobs run using this Pipeline instance. |
|
PCollection<String> |
Pipeline.readTextFile(String pathName)
A convenience method for reading a text file. |
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a PCollection instance that acts as the union of this
PCollection and the input PCollections. |
|
PCollection<S> |
PCollection.union(PCollection<S> other)
Returns a PCollection instance that acts as the union of this
PCollection and the given PCollection. |
|
PCollection<V> |
PTable.values()
Returns a PCollection made up of the values in this PTable. |
|
PCollection<S> |
PCollection.write(Target target)
Write the contents of this PCollection to the given Target,
using the storage format specified by the target. |
|
PCollection<S> |
PCollection.write(Target target,
Target.WriteMode writeMode)
Write the contents of this PCollection to the given Target,
using the given Target.WriteMode to handle existing
targets. |
|
| Methods in org.apache.crunch with parameters of type PCollection | ||
|---|---|---|
|
Pipeline.materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a PCollection instance that acts as the union of this
PCollection and the input PCollections. |
|
PCollection<S> |
PCollection.union(PCollection<S> other)
Returns a PCollection instance that acts as the union of this
PCollection and the given PCollection. |
|
void |
Pipeline.write(PCollection<?> collection,
Target target)
Write the given collection to the given target on the next pipeline run. |
|
void |
Pipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
Write the contents of the PCollection to the given Target,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
|
|
Pipeline.writeTextFile(PCollection<T> collection,
String pathName)
A convenience method for writing a text file. |
|
| Uses of PCollection in org.apache.crunch.contrib.bloomfilter |
|---|
| Methods in org.apache.crunch.contrib.bloomfilter with parameters of type PCollection | ||
|---|---|---|
static
|
BloomFilterFactory.createFilter(PCollection<T> collection,
BloomFilterFn<T> filterFn)
|
|
| Uses of PCollection in org.apache.crunch.contrib.text |
|---|
| Methods in org.apache.crunch.contrib.text that return PCollection | ||
|---|---|---|
static
|
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T>. |
|
static
|
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily. |
|
| Methods in org.apache.crunch.contrib.text with parameters of type PCollection | ||
|---|---|---|
static
|
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T>. |
|
static
|
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily. |
|
static
|
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>>. |
|
static
|
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily. |
|
| Uses of PCollection in org.apache.crunch.examples |
|---|
| Methods in org.apache.crunch.examples that return PCollection | |
|---|---|
PCollection<org.apache.hadoop.hbase.client.Put> |
WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase. |
| Uses of PCollection in org.apache.crunch.impl.mem |
|---|
| Methods in org.apache.crunch.impl.mem that return PCollection | ||
|---|---|---|
static
|
MemPipeline.collectionOf(Iterable<T> collect)
|
|
static
|
MemPipeline.collectionOf(T... ts)
|
|
|
MemPipeline.read(Source<T> source)
|
|
PCollection<String> |
MemPipeline.readTextFile(String pathName)
|
|
static
|
MemPipeline.typedCollectionOf(PType<T> ptype,
Iterable<T> collect)
|
|
static
|
MemPipeline.typedCollectionOf(PType<T> ptype,
T... ts)
|
|
| Methods in org.apache.crunch.impl.mem with parameters of type PCollection | ||
|---|---|---|
|
MemPipeline.materialize(PCollection<T> pcollection)
|
|
void |
MemPipeline.write(PCollection<?> collection,
Target target)
|
|
void |
MemPipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
|
|
|
MemPipeline.writeTextFile(PCollection<T> collection,
String pathName)
|
|
| Uses of PCollection in org.apache.crunch.impl.mr |
|---|
| Methods in org.apache.crunch.impl.mr that return PCollection | ||
|---|---|---|
|
MRPipeline.read(Source<S> source)
|
|
PCollection<String> |
MRPipeline.readTextFile(String pathName)
|
|
| Methods in org.apache.crunch.impl.mr with parameters of type PCollection | ||
|---|---|---|
|
MRPipeline.getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection. |
|
|
MRPipeline.materialize(PCollection<T> pcollection)
|
|
void |
MRPipeline.write(PCollection<?> pcollection,
Target target)
|
|
void |
MRPipeline.write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode)
|
|
|
MRPipeline.writeTextFile(PCollection<T> pcollection,
String pathName)
|
|
| Uses of PCollection in org.apache.crunch.lib |
|---|
| Methods in org.apache.crunch.lib that return PCollection | ||
|---|---|---|
static
|
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified PCollections (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified PCollections (using the
same strategy as Pig's CROSS operator). |
|
static
|
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements. |
|
static
|
Distinct.distinct(PCollection<S> input)
Construct a new PCollection that contains the unique elements of a
given input PCollection. |
|
static
|
Distinct.distinct(PCollection<S> input,
int flushEvery)
A distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
|
static
|
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes)
The most general purpose of the weighted reservoir sampling patterns that allows us to choose a random sample of elements for each of N input groups. |
|
static
|
Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input,
int[] sampleSizes,
Long seed)
Same as the other groupedWeightedReservoirSample method, but include a seed for testing purposes. |
|
static
|
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements. |
|
static
|
PTables.keys(PTable<K,V> ptable)
Extract the keys from the given PTable<K, V> as a PCollection<K>. |
|
static
|
Sample.reservoirSample(PCollection<T> input,
int sampleSize)
Select a fixed number of elements from the given PCollection with each element
equally likely to be included in the sample. |
|
static
|
Sample.reservorSample(PCollection<T> input,
int sampleSize,
Long seed)
A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes. |
|
static
|
Sample.sample(PCollection<S> input,
double probability)
Output records from the given PCollection with the given probability. |
|
static
|
Sample.sample(PCollection<S> input,
Long seed,
double probability)
Output records from the given PCollection using a given seed. |
|
static
|
Sort.sort(PCollection<T> collection)
Sorts the PCollection using the natural ordering of its elements in ascending order. |
|
static
|
Sort.sort(PCollection<T> collection,
int numReducers,
Sort.Order order)
Sorts the PCollection using the natural ordering of its elements in
the order specified using the given number of reducers. |
|
static
|
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the PCollection using the natural order of its elements with the given Order. |
|
static
|
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T>. |
|
static
|
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Pairs using the specified column
ordering. |
|
static
|
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple4s using the specified column
ordering. |
|
static
|
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple3s using the specified column
ordering. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
int numReducers,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of TupleNs using the specified column
ordering and a client-specified number of reducers. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of tuples using the specified column ordering. |
|
static
|
PTables.values(PTable<K,V> ptable)
Extract the values from the given PTable<K, V> as a PCollection<V>. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize)
Selects a weighted sample of the elements of the given PCollection, where the second term in
the input Pair is a numerical weight. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize,
Long seed)
The weighted reservoir sampling function with the seed term exposed for testing purposes. |
|
| Methods in org.apache.crunch.lib with parameters of type PCollection | ||
|---|---|---|
static
|
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given PCollection<Pair<K, V>> to a PTable<K, V>. |
|
static
|
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. |
|
static
|
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. |
|
static
|
Aggregate.count(PCollection<S> collect)
Returns a PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified PCollections (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified PCollections (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified PCollections (using the
same strategy as Pig's CROSS operator). |
|
static
|
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified PCollections (using the
same strategy as Pig's CROSS operator). |
|
static
|
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements. |
|
static
|
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements. |
|
static
|
Distinct.distinct(PCollection<S> input)
Construct a new PCollection that contains the unique elements of a
given input PCollection. |
|
static
|
Distinct.distinct(PCollection<S> input,
int flushEvery)
A distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
|
static
|
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements. |
|
static
|
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements. |
|
static
|
Aggregate.length(PCollection<S> collect)
Returns the number of elements in the provided PCollection. |
|
static
|
Aggregate.max(PCollection<S> collect)
Returns the largest numerical element from the input collection. |
|
static
|
Aggregate.min(PCollection<S> collect)
Returns the smallest numerical element from the input collection. |
|
static
|
Sample.reservoirSample(PCollection<T> input,
int sampleSize)
Select a fixed number of elements from the given PCollection with each element
equally likely to be included in the sample. |
|
static
|
Sample.reservorSample(PCollection<T> input,
int sampleSize,
Long seed)
A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes. |
|
static
|
Sample.sample(PCollection<S> input,
double probability)
Output records from the given PCollection with the given probability. |
|
static
|
Sample.sample(PCollection<S> input,
Long seed,
double probability)
Output records from the given PCollection using a given seed. |
|
static
|
Sort.sort(PCollection<T> collection)
Sorts the PCollection using the natural ordering of its elements in ascending order. |
|
static
|
Sort.sort(PCollection<T> collection,
int numReducers,
Sort.Order order)
Sorts the PCollection using the natural ordering of its elements in
the order specified using the given number of reducers. |
|
static
|
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the PCollection using the natural order of its elements with the given Order. |
|
static
|
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Pairs using the specified column
ordering. |
|
static
|
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple4s using the specified column
ordering. |
|
static
|
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of Tuple3s using the specified column
ordering. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
int numReducers,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of TupleNs using the specified column
ordering and a client-specified number of reducers. |
|
static
|
Sort.sortTuples(PCollection<T> collection,
Sort.ColumnOrder... columnOrders)
Sorts the PCollection of tuples using the specified column ordering. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize)
Selects a weighted sample of the elements of the given PCollection, where the second term in
the input Pair is a numerical weight. |
|
static
|
Sample.weightedReservoirSample(PCollection<Pair<T,N>> input,
int sampleSize,
Long seed)
The weighted reservoir sampling function with the seed term exposed for testing purposes. |
|
| Uses of PCollection in org.apache.crunch.util |
|---|
| Methods in org.apache.crunch.util that return PCollection | ||
|---|---|---|
|
CrunchTool.read(Source<T> source)
|
|
PCollection<String> |
CrunchTool.readTextFile(String pathName)
|
|
| Methods in org.apache.crunch.util with parameters of type PCollection | ||
|---|---|---|
static
|
PartitionUtils.getRecommendedPartitions(PCollection<T> pcollection,
org.apache.hadoop.conf.Configuration conf)
|
|
|
CrunchTool.materialize(PCollection<T> pcollection)
|
|
void |
CrunchTool.write(PCollection<?> pcollection,
Target target)
|
|
void |
CrunchTool.writeTextFile(PCollection<?> pcollection,
String pathName)
|
|
|
|||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||