Uses of Interface org.apache.crunch.PCollection (Apache Crunch 0.14.0 API)

Packages that use PCollection
Package	Description
org.apache.crunch	Client-facing API and core abstractions.
org.apache.crunch.contrib.bloomfilter	Support for creating Bloom Filters.
org.apache.crunch.contrib.text
org.apache.crunch.examples	Example applications demonstrating various aspects of Crunch.
org.apache.crunch.impl.dist
org.apache.crunch.impl.dist.collect
org.apache.crunch.impl.mem	In-memory Pipeline implementation for rapid prototyping and testing.
org.apache.crunch.impl.mr	A Pipeline implementation that runs on Hadoop MapReduce.
org.apache.crunch.impl.spark
org.apache.crunch.impl.spark.collect
org.apache.crunch.lambda	Alternative Crunch API using Java 8 features to allow construction of pipelines using lambda functions and method references.
org.apache.crunch.lib	Joining, sorting, aggregating, and other commonly used functionality.
org.apache.crunch.lib.join	Inner and outer joins on collections.
org.apache.crunch.util	An assorted set of utilities.

Uses of PCollection in org.apache.crunch

Subinterfaces of PCollection in org.apache.crunch
Modifier and Type	Interface and Description
`interface`	`PGroupedTable<K,V>` The Crunch representation of a grouped `PTable`, which corresponds to the output of the shuffle phase of a MapReduce job.
`interface`	`PTable<K,V>` A sub-interface of `PCollection` that represents an immutable, distributed multi-map of keys and values.

Methods in org.apache.crunch that return PCollection
Modifier and Type	Method and Description
`PCollection<S>`	PCollection.`aggregate(Aggregator<S> aggregator)` Returns a `PCollection` that contains the result of aggregating all values in this instance.
`PCollection<S>`	PCollection.`cache()` Marks this data as cached using the default `CachingOptions`.
`PCollection<S>`	PCollection.`cache(CachingOptions options)` Marks this data as cached using the given `CachingOptions`.
`<T> PCollection<T>`	Pipeline.`create(Iterable<T> contents, PType<T> ptype)` Creates a `PCollection` containing the values found in the given `Iterable` using an implementation-specific distribution mechanism.
`<T> PCollection<T>`	Pipeline.`create(Iterable<T> contents, PType<T> ptype, CreateOptions options)` Creates a `PCollection` containing the values found in the given `Iterable` using an implementation-specific distribution mechanism.
`<T> PCollection<T>`	Pipeline.`emptyPCollection(PType<T> ptype)` Creates an empty `PCollection` of the given `PType`.
`PCollection<S>`	PCollection.`filter(FilterFn<S> filterFn)` Apply the given filter function to this instance and return the resulting `PCollection`.
`PCollection<S>`	PCollection.`filter(String name, FilterFn<S> filterFn)` Apply the given filter function to this instance and return the resulting `PCollection`.
`PCollection<K>`	PTable.`keys()` Returns a `PCollection` made up of the keys in this PTable.
`<T> PCollection<T>`	PCollection.`parallelDo(DoFn<S,T> doFn, PType<T> type)` Applies the given doFn to the elements of this `PCollection` and returns a new `PCollection` that is the output of this processing.
`<T> PCollection<T>`	PCollection.`parallelDo(String name, DoFn<S,T> doFn, PType<T> type)` Applies the given doFn to the elements of this `PCollection` and returns a new `PCollection` that is the output of this processing.
`<T> PCollection<T>`	PCollection.`parallelDo(String name, DoFn<S,T> doFn, PType<T> type, ParallelDoOptions options)` Applies the given doFn to the elements of this `PCollection` and returns a new `PCollection` that is the output of this processing.
`<T> PCollection<T>`	Pipeline.`read(Source<T> source)` Converts the given `Source` into a `PCollection` that is available to jobs run using this `Pipeline` instance.
`<T> PCollection<T>`	Pipeline.`read(Source<T> source, String named)` Converts the given `Source` into a `PCollection` that is available to jobs run using this `Pipeline` instance.
`PCollection<String>`	Pipeline.`readTextFile(String pathName)` A convenience method for reading a text file.
`<S> PCollection<S>`	Pipeline.`union(List<PCollection<S>> collections)`
`PCollection<S>`	PCollection.`union(PCollection<S>... collections)` Returns a `PCollection` instance that acts as the union of this `PCollection` and the input `PCollection`s.
`PCollection<S>`	PCollection.`union(PCollection<S> other)` Returns a `PCollection` instance that acts as the union of this `PCollection` and the given `PCollection`.
`PCollection<V>`	PTable.`values()` Returns a `PCollection` made up of the values in this PTable.
`PCollection<S>`	PCollection.`write(Target target)` Write the contents of this `PCollection` to the given `Target`, using the storage format specified by the target.
`PCollection<S>`	PCollection.`write(Target target, Target.WriteMode writeMode)` Write the contents of this `PCollection` to the given `Target`, using the given `Target.WriteMode` to handle existing targets.

Methods in org.apache.crunch that return types with arguments of type PCollection
Modifier and Type	Method and Description
`Map<String,PCollection<?>>`	PipelineCallable.`getAllPCollections()` Returns the mapping of labels to PCollection dependencies for this instance.

Methods in org.apache.crunch with parameters of type PCollection
Modifier and Type	Method and Description
`<T> void`	Pipeline.`cache(PCollection<T> pcollection, CachingOptions options)` Caches the given PCollection so that it will be processed at most once during pipeline execution.
`PipelineCallable<Output>`	PipelineCallable.`dependsOn(String label, PCollection<?> pcollect)` Requires that the given `PCollection` be materialized to disk before this instance may be executed.
`<T> Iterable<T>`	Pipeline.`materialize(PCollection<T> pcollection)` Create the given PCollection and read the data it contains into the returned Collection instance for client use.
`PCollection<S>`	PCollection.`union(PCollection<S>... collections)` Returns a `PCollection` instance that acts as the union of this `PCollection` and the input `PCollection`s.
`PCollection<S>`	PCollection.`union(PCollection<S> other)` Returns a `PCollection` instance that acts as the union of this `PCollection` and the given `PCollection`.
`void`	Pipeline.`write(PCollection<?> collection, Target target)` Write the given collection to the given target on the next pipeline run.
`void`	Pipeline.`write(PCollection<?> collection, Target target, Target.WriteMode writeMode)` Write the contents of the `PCollection` to the given `Target`, using the storage format specified by the target and the given `WriteMode` for cases where the referenced `Target` already exists.
`<T> void`	Pipeline.`writeTextFile(PCollection<T> collection, String pathName)` A convenience method for writing a text file.

Method parameters in org.apache.crunch with type arguments of type PCollection
Modifier and Type	Method and Description
`<S> PCollection<S>`	Pipeline.`union(List<PCollection<S>> collections)`

Uses of PCollection in org.apache.crunch.contrib.bloomfilter

Methods in org.apache.crunch.contrib.bloomfilter with parameters of type PCollection
Modifier and Type	Method and Description
`static <T> PObject<org.apache.hadoop.util.bloom.BloomFilter>`	BloomFilterFactory.`createFilter(PCollection<T> collection, BloomFilterFn<T> filterFn)`

Uses of PCollection in org.apache.crunch.contrib.text

Methods in org.apache.crunch.contrib.text that return PCollection
Modifier and Type	Method and Description
`static <T> PCollection<T>`	Parse.`parse(String groupName, PCollection<String> input, Extractor<T> extractor)` Parses the lines of the input `PCollection<String>` and returns a `PCollection<T>` using the given `Extractor<T>`.
`static <T> PCollection<T>`	Parse.`parse(String groupName, PCollection<String> input, PTypeFamily ptf, Extractor<T> extractor)` Parses the lines of the input `PCollection<String>` and returns a `PCollection<T>` using the given `Extractor<T>` that uses the given `PTypeFamily`.

Methods in org.apache.crunch.contrib.text with parameters of type PCollection
Modifier and Type	Method and Description
`static <T> PCollection<T>`	Parse.`parse(String groupName, PCollection<String> input, Extractor<T> extractor)` Parses the lines of the input `PCollection<String>` and returns a `PCollection<T>` using the given `Extractor<T>`.
`static <T> PCollection<T>`	Parse.`parse(String groupName, PCollection<String> input, PTypeFamily ptf, Extractor<T> extractor)` Parses the lines of the input `PCollection<String>` and returns a `PCollection<T>` using the given `Extractor<T>` that uses the given `PTypeFamily`.
`static <K,V> PTable<K,V>`	Parse.`parseTable(String groupName, PCollection<String> input, Extractor<Pair<K,V>> extractor)` Parses the lines of the input `PCollection<String>` and returns a `PTable<K, V>` using the given `Extractor<Pair<K, V>>`.
`static <K,V> PTable<K,V>`	Parse.`parseTable(String groupName, PCollection<String> input, PTypeFamily ptf, Extractor<Pair<K,V>> extractor)` Parses the lines of the input `PCollection<String>` and returns a `PTable<K, V>` using the given `Extractor<Pair<K, V>>` that uses the given `PTypeFamily`.

Uses of PCollection in org.apache.crunch.examples

Methods in org.apache.crunch.examples that return PCollection
Modifier and Type	Method and Description
`PCollection<org.apache.hadoop.hbase.client.Put>`	WordAggregationHBase.`createPut(PTable<String,String> extractedText)` Create puts in order to insert them in hbase.

Uses of PCollection in org.apache.crunch.impl.dist

Methods in org.apache.crunch.impl.dist that return PCollection
Modifier and Type	Method and Description
`<S> PCollection<S>`	DistributedPipeline.`create(Iterable<S> contents, PType<S> ptype)`
`<S> PCollection<S>`	DistributedPipeline.`create(Iterable<S> contents, PType<S> ptype, CreateOptions options)`
`<S> PCollection<S>`	DistributedPipeline.`emptyPCollection(PType<S> ptype)`
`<S> PCollection<S>`	DistributedPipeline.`read(Source<S> source)`
`<S> PCollection<S>`	DistributedPipeline.`read(Source<S> source, String named)`
`PCollection<String>`	DistributedPipeline.`readTextFile(String pathName)`
`<S> PCollection<S>`	DistributedPipeline.`union(List<PCollection<S>> collections)`

Methods in org.apache.crunch.impl.dist with parameters of type PCollection
Modifier and Type	Method and Description
`<T> ReadableSource<T>`	DistributedPipeline.`getMaterializeSourceTarget(PCollection<T> pcollection)` Retrieve a ReadableSourceTarget that provides access to the contents of a `PCollection`.
`void`	DistributedPipeline.`write(PCollection<?> pcollection, Target target)`
`void`	DistributedPipeline.`write(PCollection<?> pcollection, Target target, Target.WriteMode writeMode)`
`<T> void`	DistributedPipeline.`writeTextFile(PCollection<T> pcollection, String pathName)`

Method parameters in org.apache.crunch.impl.dist with type arguments of type PCollection
Modifier and Type	Method and Description
`<S> PCollection<S>`	DistributedPipeline.`union(List<PCollection<S>> collections)`

Uses of PCollection in org.apache.crunch.impl.dist.collect

Classes in org.apache.crunch.impl.dist.collect that implement PCollection
Modifier and Type	Class and Description
`class`	`BaseDoCollection<S>`
`class`	`BaseDoTable<K,V>`
`class`	`BaseGroupedTable<K,V>`
`class`	`BaseInputCollection<S>`
`class`	`BaseInputTable<K,V>`
`class`	`BaseUnionCollection<S>`
`class`	`BaseUnionTable<K,V>`
`class`	`EmptyPCollection<T>`
`class`	`EmptyPTable<K,V>`
`class`	`PCollectionImpl<S>`
`class`	`PTableBase<K,V>`

Methods in org.apache.crunch.impl.dist.collect that return PCollection
Modifier and Type	Method and Description
`PCollection<S>`	PCollectionImpl.`aggregate(Aggregator<S> aggregator)`
`PCollection<S>`	PCollectionImpl.`cache()`
`PCollection<S>`	PCollectionImpl.`cache(CachingOptions options)`
`PCollection<S>`	PCollectionImpl.`filter(FilterFn<S> filterFn)`
`PCollection<S>`	PCollectionImpl.`filter(String name, FilterFn<S> filterFn)`
`PCollection<K>`	PTableBase.`keys()`
`<T> PCollection<T>`	PCollectionImpl.`parallelDo(DoFn<S,T> fn, PType<T> type)`
`<T> PCollection<T>`	PCollectionImpl.`parallelDo(String name, DoFn<S,T> fn, PType<T> type)`
`<T> PCollection<T>`	PCollectionImpl.`parallelDo(String name, DoFn<S,T> fn, PType<T> type, ParallelDoOptions options)`
`PCollection<S>`	PCollectionImpl.`union(PCollection<S>... collections)`
`PCollection<S>`	PCollectionImpl.`union(PCollection<S> other)`
`PCollection<V>`	PTableBase.`values()`
`PCollection<S>`	PCollectionImpl.`write(Target target)`
`PCollection<S>`	PCollectionImpl.`write(Target target, Target.WriteMode writeMode)`

Methods in org.apache.crunch.impl.dist.collect with parameters of type PCollection
Modifier and Type	Method and Description
`PCollection<S>`	PCollectionImpl.`union(PCollection<S>... collections)`
`PCollection<S>`	PCollectionImpl.`union(PCollection<S> other)`

Uses of PCollection in org.apache.crunch.impl.mem

Methods in org.apache.crunch.impl.mem that return PCollection
Modifier and Type	Method and Description
`static <T> PCollection<T>`	MemPipeline.`collectionOf(Iterable<T> collect)`
`static <T> PCollection<T>`	MemPipeline.`collectionOf(T... ts)`
`<T> PCollection<T>`	MemPipeline.`create(Iterable<T> contents, PType<T> ptype)`
`<T> PCollection<T>`	MemPipeline.`create(Iterable<T> iterable, PType<T> ptype, CreateOptions options)`
`<T> PCollection<T>`	MemPipeline.`emptyPCollection(PType<T> ptype)`
`<T> PCollection<T>`	MemPipeline.`read(Source<T> source)`
`<T> PCollection<T>`	MemPipeline.`read(Source<T> source, String named)`
`PCollection<String>`	MemPipeline.`readTextFile(String pathName)`
`static <T> PCollection<T>`	MemPipeline.`typedCollectionOf(PType<T> ptype, Iterable<T> collect)`
`static <T> PCollection<T>`	MemPipeline.`typedCollectionOf(PType<T> ptype, T... ts)`
`<S> PCollection<S>`	MemPipeline.`union(List<PCollection<S>> collections)`

Methods in org.apache.crunch.impl.mem with parameters of type PCollection
Modifier and Type	Method and Description
`<T> void`	MemPipeline.`cache(PCollection<T> pcollection, CachingOptions options)`
`<T> Iterable<T>`	MemPipeline.`materialize(PCollection<T> pcollection)`
`void`	MemPipeline.`write(PCollection<?> collection, Target target)`
`void`	MemPipeline.`write(PCollection<?> collection, Target target, Target.WriteMode writeMode)`
`<T> void`	MemPipeline.`writeTextFile(PCollection<T> collection, String pathName)`

Method parameters in org.apache.crunch.impl.mem with type arguments of type PCollection
Modifier and Type	Method and Description
`<S> PCollection<S>`	MemPipeline.`union(List<PCollection<S>> collections)`

Uses of PCollection in org.apache.crunch.impl.mr

Methods in org.apache.crunch.impl.mr with parameters of type PCollection
Modifier and Type	Method and Description
`<T> void`	MRPipeline.`cache(PCollection<T> pcollection, CachingOptions options)`
`<T> Iterable<T>`	MRPipeline.`materialize(PCollection<T> pcollection)`

Uses of PCollection in org.apache.crunch.impl.spark

Methods in org.apache.crunch.impl.spark that return PCollection
Modifier and Type	Method and Description
`<S> PCollection<S>`	SparkPipeline.`create(Iterable<S> contents, PType<S> ptype, CreateOptions options)`
`<S> PCollection<S>`	SparkPipeline.`emptyPCollection(PType<S> ptype)`

Methods in org.apache.crunch.impl.spark with parameters of type PCollection
Modifier and Type	Method and Description
`<T> void`	SparkPipeline.`cache(PCollection<T> pcollection, CachingOptions options)`
`org.apache.spark.storage.StorageLevel`	SparkRuntime.`getStorageLevel(PCollection<?> pcollection)`
`<T> Iterable<T>`	SparkPipeline.`materialize(PCollection<T> pcollection)`

Constructor parameters in org.apache.crunch.impl.spark with type arguments of type PCollection
Constructor and Description
`SparkRuntime(SparkPipeline pipeline, org.apache.spark.api.java.JavaSparkContext sparkContext, org.apache.hadoop.conf.Configuration conf, Map<PCollectionImpl<?>,Set<Target>> outputTargets, Map<PCollectionImpl<?>,org.apache.crunch.materialize.MaterializableIterable> toMaterialize, Map<PCollection<?>,org.apache.spark.storage.StorageLevel> toCache, Map<PipelineCallable<?>,Set<Target>> allPipelineCallables)`

Uses of PCollection in org.apache.crunch.impl.spark.collect

Classes in org.apache.crunch.impl.spark.collect that implement PCollection
Modifier and Type	Class and Description
`class`	`CreatedCollection<T>` Represents a Spark-based PCollection that was created from a Java `Iterable` of values.
`class`	`CreatedTable<K,V>` Represents a Spark-based PTable that was created from a Java `Iterable` of key-value pairs.
`class`	`DoCollection<S>`
`class`	`DoTable<K,V>`
`class`	`InputCollection<S>`
`class`	`InputTable<K,V>`
`class`	`PGroupedTableImpl<K,V>`
`class`	`UnionCollection<S>`
`class`	`UnionTable<K,V>`

Uses of PCollection in org.apache.crunch.lambda

Methods in org.apache.crunch.lambda that return PCollection
Modifier and Type	Method and Description
`PCollection<S>`	LCollection.`underlying()` Get the underlying `PCollection` for this LCollection

Methods in org.apache.crunch.lambda with parameters of type PCollection
Modifier and Type	Method and Description
`default LCollection<S>`	LCollection.`union(PCollection<S> other)` Union this LCollection with a `PCollection` of the same type
`<S> LCollection<S>`	LCollectionFactory.`wrap(PCollection<S> collection)` Wrap a PCollection into an LCollection
`static <S> LCollection<S>`	Lambda.`wrap(PCollection<S> collection)`

Uses of PCollection in org.apache.crunch.lib

Methods in org.apache.crunch.lib that return PCollection
Modifier and Type	Method and Description
`static <S> PCollection<S>`	Aggregate.`aggregate(PCollection<S> collect, Aggregator<S> aggregator)`
`static <T> PCollection<Tuple3<T,T,T>>`	Set.`comm(PCollection<T> coll1, PCollection<T> coll2)` Find the elements that are common to two sets, like the Unix `comm` utility.
`static <U,V> PCollection<Pair<U,V>>`	Cartesian.`cross(PCollection<U> left, PCollection<V> right)` Performs a full cross join on the specified `PCollection`s (using the same strategy as Pig's CROSS operator).
`static <U,V> PCollection<Pair<U,V>>`	Cartesian.`cross(PCollection<U> left, PCollection<V> right, int parallelism)` Performs a full cross join on the specified `PCollection`s (using the same strategy as Pig's CROSS operator).
`static <T> PCollection<T>`	Set.`difference(PCollection<T> coll1, PCollection<T> coll2)` Compute the set difference between two sets of elements.
`static <S> PCollection<S>`	Distinct.`distinct(PCollection<S> input)` Construct a new `PCollection` that contains the unique elements of a given input `PCollection`.
`static <S> PCollection<S>`	Distinct.`distinct(PCollection<S> input, int flushEvery)` A `distinct` operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.
`static <T,N extends Number> PCollection<Pair<Integer,T>>`	Sample.`groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input, int[] sampleSizes)` The most general purpose of the weighted reservoir sampling patterns that allows us to choose a random sample of elements for each of N input groups.
`static <T,N extends Number> PCollection<Pair<Integer,T>>`	Sample.`groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input, int[] sampleSizes, Long seed)` Same as the other groupedWeightedReservoirSample method, but include a seed for testing purposes.
`static <T> PCollection<T>`	Set.`intersection(PCollection<T> coll1, PCollection<T> coll2)` Compute the intersection of two sets of elements.
`static <K,V> PCollection<K>`	PTables.`keys(PTable<K,V> ptable)` Extract the keys from the given `PTable<K, V>` as a `PCollection<K>`.
`static <T> PCollection<T>`	Sample.`reservoirSample(PCollection<T> input, int sampleSize)` Select a fixed number of elements from the given `PCollection` with each element equally likely to be included in the sample.
`static <T> PCollection<T>`	Sample.`reservoirSample(PCollection<T> input, int sampleSize, Long seed)` A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes.
`static <S> PCollection<S>`	Sample.`sample(PCollection<S> input, double probability)` Output records from the given `PCollection` with the given probability.
`static <S> PCollection<S>`	Sample.`sample(PCollection<S> input, Long seed, double probability)` Output records from the given `PCollection` using a given seed.
`static <T> PCollection<T>`	Shard.`shard(PCollection<T> pc, int numPartitions)` Creates a `PCollection<T>` that has the same contents as its input argument but will be written to a fixed number of output files.
`static <T> PCollection<T>`	Sort.`sort(PCollection<T> collection)` Sorts the `PCollection` using the natural ordering of its elements in ascending order.
`static <T> PCollection<T>`	Sort.`sort(PCollection<T> collection, int numReducers, Sort.Order order)` Sorts the `PCollection` using the natural ordering of its elements in the order specified using the given number of reducers.
`static <T> PCollection<T>`	Sort.`sort(PCollection<T> collection, Sort.Order order)` Sorts the `PCollection` using the natural order of its elements with the given `Order`.
`static <K,V1,V2,T> PCollection<T>`	SecondarySort.`sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype)` Perform a secondary sort on the given `PTable` instance and then apply a `DoFn` to the resulting sorted data to yield an output `PCollection<T>`.
`static <K,V1,V2,T> PCollection<T>`	SecondarySort.`sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype, int numReducers)` Perform a secondary sort on the given `PTable` instance and then apply a `DoFn` to the resulting sorted data to yield an output `PCollection<T>`, using the given number of reducers.
`static <U,V> PCollection<Pair<U,V>>`	Sort.`sortPairs(PCollection<Pair<U,V>> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `Pair`s using the specified column ordering.
`static <V1,V2,V3,V4> PCollection<Tuple4<V1,V2,V3,V4>>`	Sort.`sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `Tuple4`s using the specified column ordering.
`static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>>`	Sort.`sortTriples(PCollection<Tuple3<V1,V2,V3>> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `Tuple3`s using the specified column ordering.
`static <T extends Tuple> PCollection<T>`	Sort.`sortTuples(PCollection<T> collection, int numReducers, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `TupleN`s using the specified column ordering and a client-specified number of reducers.
`static <T extends Tuple> PCollection<T>`	Sort.`sortTuples(PCollection<T> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of tuples using the specified column ordering.
`static <K,V> PCollection<V>`	PTables.`values(PTable<K,V> ptable)` Extract the values from the given `PTable<K, V>` as a `PCollection<V>`.
`static <T,N extends Number> PCollection<T>`	Sample.`weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize)` Selects a weighted sample of the elements of the given `PCollection`, where the second term in the input `Pair` is a numerical weight.
`static <T,N extends Number> PCollection<T>`	Sample.`weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize, Long seed)` The weighted reservoir sampling function with the seed term exposed for testing purposes.

Methods in org.apache.crunch.lib that return types with arguments of type PCollection
Modifier and Type	Method and Description
`static <T,U> Pair<PCollection<T>,PCollection<U>>`	Channels.`split(PCollection<Pair<T,U>> pCollection)` Splits a `PCollection` of any `Pair` of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
`static <T,U> Pair<PCollection<T>,PCollection<U>>`	Channels.`split(PCollection<Pair<T,U>> pCollection)` Splits a `PCollection` of any `Pair` of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
`static <T,U> Pair<PCollection<T>,PCollection<U>>`	Channels.`split(PCollection<Pair<T,U>> pCollection, PType<T> firstPType, PType<U> secondPType)` Splits a `PCollection` of any `Pair` of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
`static <T,U> Pair<PCollection<T>,PCollection<U>>`	Channels.`split(PCollection<Pair<T,U>> pCollection, PType<T> firstPType, PType<U> secondPType)` Splits a `PCollection` of any `Pair` of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

Methods in org.apache.crunch.lib with parameters of type PCollection
Modifier and Type	Method and Description
`static <S> PCollection<S>`	Aggregate.`aggregate(PCollection<S> collect, Aggregator<S> aggregator)`
`static <K,V> PTable<K,V>`	PTables.`asPTable(PCollection<Pair<K,V>> pcollect)` Convert the given `PCollection<Pair<K, V>>` to a `PTable<K, V>`.
`static <T> PCollection<Tuple3<T,T,T>>`	Set.`comm(PCollection<T> coll1, PCollection<T> coll2)` Find the elements that are common to two sets, like the Unix `comm` utility.
`static <T> PCollection<Tuple3<T,T,T>>`	Set.`comm(PCollection<T> coll1, PCollection<T> coll2)` Find the elements that are common to two sets, like the Unix `comm` utility.
`static <S> PTable<S,Long>`	Aggregate.`count(PCollection<S> collect)` Returns a `PTable` that contains the unique elements of this collection mapped to a count of their occurrences.
`static <S> PTable<S,Long>`	Aggregate.`count(PCollection<S> collect, int numPartitions)` Returns a `PTable` that contains the unique elements of this collection mapped to a count of their occurrences.
`static <U,V> PCollection<Pair<U,V>>`	Cartesian.`cross(PCollection<U> left, PCollection<V> right)` Performs a full cross join on the specified `PCollection`s (using the same strategy as Pig's CROSS operator).
`static <U,V> PCollection<Pair<U,V>>`	Cartesian.`cross(PCollection<U> left, PCollection<V> right)` Performs a full cross join on the specified `PCollection`s (using the same strategy as Pig's CROSS operator).
`static <U,V> PCollection<Pair<U,V>>`	Cartesian.`cross(PCollection<U> left, PCollection<V> right, int parallelism)` Performs a full cross join on the specified `PCollection`s (using the same strategy as Pig's CROSS operator).
`static <U,V> PCollection<Pair<U,V>>`	Cartesian.`cross(PCollection<U> left, PCollection<V> right, int parallelism)` Performs a full cross join on the specified `PCollection`s (using the same strategy as Pig's CROSS operator).
`static <T> PCollection<T>`	Set.`difference(PCollection<T> coll1, PCollection<T> coll2)` Compute the set difference between two sets of elements.
`static <T> PCollection<T>`	Set.`difference(PCollection<T> coll1, PCollection<T> coll2)` Compute the set difference between two sets of elements.
`static <S> PCollection<S>`	Distinct.`distinct(PCollection<S> input)` Construct a new `PCollection` that contains the unique elements of a given input `PCollection`.
`static <S> PCollection<S>`	Distinct.`distinct(PCollection<S> input, int flushEvery)` A `distinct` operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.
`static <X> PTable<X,Long>`	TopList.`globalToplist(PCollection<X> input)` Create a list of unique items in the input collection with their count, sorted descending by their frequency.
`static <T> PCollection<T>`	Set.`intersection(PCollection<T> coll1, PCollection<T> coll2)` Compute the intersection of two sets of elements.
`static <T> PCollection<T>`	Set.`intersection(PCollection<T> coll1, PCollection<T> coll2)` Compute the intersection of two sets of elements.
`static <S> PObject<Long>`	Aggregate.`length(PCollection<S> collect)` Returns the number of elements in the provided PCollection.
`static <S> PObject<S>`	Aggregate.`max(PCollection<S> collect)` Returns the largest numerical element from the input collection.
`static <S> PObject<S>`	Aggregate.`min(PCollection<S> collect)` Returns the smallest numerical element from the input collection.
`static <T> PCollection<T>`	Sample.`reservoirSample(PCollection<T> input, int sampleSize)` Select a fixed number of elements from the given `PCollection` with each element equally likely to be included in the sample.
`static <T> PCollection<T>`	Sample.`reservoirSample(PCollection<T> input, int sampleSize, Long seed)` A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes.
`static <S> PCollection<S>`	Sample.`sample(PCollection<S> input, double probability)` Output records from the given `PCollection` with the given probability.
`static <S> PCollection<S>`	Sample.`sample(PCollection<S> input, Long seed, double probability)` Output records from the given `PCollection` using a given seed.
`static <T> PCollection<T>`	Shard.`shard(PCollection<T> pc, int numPartitions)` Creates a `PCollection<T>` that has the same contents as its input argument but will be written to a fixed number of output files.
`static <T> PCollection<T>`	Sort.`sort(PCollection<T> collection)` Sorts the `PCollection` using the natural ordering of its elements in ascending order.
`static <T> PCollection<T>`	Sort.`sort(PCollection<T> collection, int numReducers, Sort.Order order)` Sorts the `PCollection` using the natural ordering of its elements in the order specified using the given number of reducers.
`static <T> PCollection<T>`	Sort.`sort(PCollection<T> collection, Sort.Order order)` Sorts the `PCollection` using the natural order of its elements with the given `Order`.
`static <U,V> PCollection<Pair<U,V>>`	Sort.`sortPairs(PCollection<Pair<U,V>> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `Pair`s using the specified column ordering.
`static <V1,V2,V3,V4> PCollection<Tuple4<V1,V2,V3,V4>>`	Sort.`sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `Tuple4`s using the specified column ordering.
`static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>>`	Sort.`sortTriples(PCollection<Tuple3<V1,V2,V3>> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `Tuple3`s using the specified column ordering.
`static <T extends Tuple> PCollection<T>`	Sort.`sortTuples(PCollection<T> collection, int numReducers, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of `TupleN`s using the specified column ordering and a client-specified number of reducers.
`static <T extends Tuple> PCollection<T>`	Sort.`sortTuples(PCollection<T> collection, Sort.ColumnOrder... columnOrders)` Sorts the `PCollection` of tuples using the specified column ordering.
`static <T,U> Pair<PCollection<T>,PCollection<U>>`	Channels.`split(PCollection<Pair<T,U>> pCollection)` Splits a `PCollection` of any `Pair` of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
`static <T,U> Pair<PCollection<T>,PCollection<U>>`	Channels.`split(PCollection<Pair<T,U>> pCollection, PType<T> firstPType, PType<U> secondPType)` Splits a `PCollection` of any `Pair` of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
`static <T,N extends Number> PCollection<T>`	Sample.`weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize)` Selects a weighted sample of the elements of the given `PCollection`, where the second term in the input `Pair` is a numerical weight.
`static <T,N extends Number> PCollection<T>`	Sample.`weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize, Long seed)` The weighted reservoir sampling function with the seed term exposed for testing purposes.

Uses of PCollection in org.apache.crunch.lib.join

Methods in org.apache.crunch.lib.join that return PCollection
Modifier and Type	Method and Description
`static <K,U,V,T> PCollection<T>`	OneToManyJoin.`oneToManyJoin(PTable<K,U> left, PTable<K,V> right, DoFn<Pair<U,Iterable<V>>,T> postProcessFn, PType<T> ptype)` Performs a join on two tables, where the left table only contains a single value per key.
`static <K,U,V,T> PCollection<T>`	OneToManyJoin.`oneToManyJoin(PTable<K,U> left, PTable<K,V> right, DoFn<Pair<U,Iterable<V>>,T> postProcessFn, PType<T> ptype, int numReducers)` Supports a user-specified number of reducers for the one-to-many join.

Uses of PCollection in org.apache.crunch.util

Methods in org.apache.crunch.util that return PCollection
Modifier and Type	Method and Description
`<T> PCollection<T>`	CrunchTool.`read(Source<T> source)`
`PCollection<String>`	CrunchTool.`readTextFile(String pathName)`

Methods in org.apache.crunch.util with parameters of type PCollection
Modifier and Type	Method and Description
`static <T> int`	PartitionUtils.`getRecommendedPartitions(PCollection<T> pcollection)`
`static <T> int`	PartitionUtils.`getRecommendedPartitions(PCollection<T> pcollection, org.apache.hadoop.conf.Configuration conf)`
`<T> Iterable<T>`	CrunchTool.`materialize(PCollection<T> pcollection)`
`void`	CrunchTool.`write(PCollection<?> pcollection, Target target)`
`void`	CrunchTool.`writeTextFile(PCollection<?> pcollection, String pathName)`

Uses of Interfaceorg.apache.crunch.PCollection

Uses of PCollection in org.apache.crunch

Uses of PCollection in org.apache.crunch.contrib.bloomfilter

Uses of PCollection in org.apache.crunch.contrib.text

Uses of PCollection in org.apache.crunch.examples

Uses of PCollection in org.apache.crunch.impl.dist

Uses of PCollection in org.apache.crunch.impl.dist.collect

Uses of PCollection in org.apache.crunch.impl.mem

Uses of PCollection in org.apache.crunch.impl.mr

Uses of PCollection in org.apache.crunch.impl.spark

Uses of PCollection in org.apache.crunch.impl.spark.collect

Uses of PCollection in org.apache.crunch.lambda

Uses of PCollection in org.apache.crunch.lib

Uses of PCollection in org.apache.crunch.lib.join

Uses of PCollection in org.apache.crunch.util

Uses of Interface
org.apache.crunch.PCollection