Uses of Interface org.apache.crunch.PCollection (Apache Crunch 0.10.0 API)

This project has retired. For details please refer to its Attic page.

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV NEXT

FRAMES NO FRAMES

Uses of Interface
org.apache.crunch.PCollection

Packages that use PCollection
org.apache.crunch	Client-facing API and core abstractions.
org.apache.crunch.contrib.bloomfilter	Support for creating Bloom Filters.
org.apache.crunch.contrib.text
org.apache.crunch.examples	Example applications demonstrating various aspects of Crunch.
org.apache.crunch.impl.dist
org.apache.crunch.impl.dist.collect
org.apache.crunch.impl.mem	In-memory Pipeline implementation for rapid prototyping and testing.
org.apache.crunch.impl.mr	A Pipeline implementation that runs on Hadoop MapReduce.
org.apache.crunch.impl.spark
org.apache.crunch.impl.spark.collect
org.apache.crunch.lib	Joining, sorting, aggregating, and other commonly used functionality.
org.apache.crunch.lib.join	Inner and outer joins on collections.
org.apache.crunch.util	An assorted set of utilities.

Uses of PCollection in org.apache.crunch

Subinterfaces of PCollection in org.apache.crunch

interface PGroupedTable<K,V>
The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.

interface PTable<K,V>
A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.

Subinterfaces of PCollection in org.apache.crunch
`interface`	`PGroupedTable<K,V>` The Crunch representation of a grouped `PTable`, which corresponds to the output of the shuffle phase of a MapReduce job.
`interface`	`PTable<K,V>` A sub-interface of `PCollection` that represents an immutable, distributed multi-map of keys and values.

Methods in org.apache.crunch that return PCollection

PCollection<S> PCollection.aggregate(Aggregator<S> aggregator)
 Returns a PCollection that contains the result of aggregating all values in this instance.

PCollection<S> PCollection.cache()
 Marks this data as cached using the default CachingOptions.

PCollection<S> PCollection.cache(CachingOptions options)
 Marks this data as cached using the given CachingOptions.

<T> PCollection<T> Pipeline.emptyPCollection(PType<T> ptype)


PCollection<S> PCollection.filter(FilterFn<S> filterFn)
 Apply the given filter function to this instance and return the resulting PCollection.

PCollection<S> PCollection.filter(String name, FilterFn<S> filterFn)
 Apply the given filter function to this instance and return the resulting PCollection.

PCollection<K> PTable.keys()
 Returns a PCollection made up of the keys in this PTable.

<T> PCollection<T> PCollection.parallelDo(DoFn<S,T> doFn, PType<T> type)
 Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

<T> PCollection<T> PCollection.parallelDo(String name, DoFn<S,T> doFn, PType<T> type)
 Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

<T> PCollection<T> PCollection.parallelDo(String name, DoFn<S,T> doFn, PType<T> type, ParallelDoOptions options)
 Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

<T> PCollection<T> Pipeline.read(Source<T> source)
 Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.

PCollection<String> Pipeline.readTextFile(String pathName)
 A convenience method for reading a text file.

PCollection<S> PCollection.union(PCollection<S>... collections)
 Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.

PCollection<S> PCollection.union(PCollection<S> other)
 Returns a PCollection instance that acts as the union of this PCollection and the given PCollection.

PCollection<V> PTable.values()
 Returns a PCollection made up of the values in this PTable.

PCollection<S> PCollection.write(Target target)
 Write the contents of this PCollection to the given Target, using the storage format specified by the target.

PCollection<S> PCollection.write(Target target, Target.WriteMode writeMode)
 Write the contents of this PCollection to the given Target, using the given Target.WriteMode to handle existing targets.

Methods in org.apache.crunch with parameters of type PCollection

<T> void Pipeline.cache(PCollection<T> pcollection, CachingOptions options)
 Caches the given PCollection so that it will be processed at most once during pipeline execution.

<T> Iterable<T> Pipeline.materialize(PCollection<T> pcollection)
 Create the given PCollection and read the data it contains into the returned Collection instance for client use.

PCollection<S> PCollection.union(PCollection<S>... collections)
 Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.

PCollection<S> PCollection.union(PCollection<S> other)
 Returns a PCollection instance that acts as the union of this PCollection and the given PCollection.

void Pipeline.write(PCollection<?> collection, Target target)
 Write the given collection to the given target on the next pipeline run.

void Pipeline.write(PCollection<?> collection, Target target, Target.WriteMode writeMode)
 Write the contents of the PCollection to the given Target, using the storage format specified by the target and the given WriteMode for cases where the referenced Target already exists.

<T> void Pipeline.writeTextFile(PCollection<T> collection, String pathName)
 A convenience method for writing a text file.

Uses of PCollection in org.apache.crunch.contrib.bloomfilter

Methods in org.apache.crunch.contrib.bloomfilter with parameters of type PCollection

static <T> PObject<org.apache.hadoop.util.bloom.BloomFilter> BloomFilterFactory.createFilter(PCollection<T> collection, BloomFilterFn<T> filterFn)

Uses of PCollection in org.apache.crunch.contrib.text

Methods in org.apache.crunch.contrib.text that return PCollection

static <T> PCollection<T> Parse.parse(String groupName, PCollection<String> input, Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using the given Extractor<T>.

static <T> PCollection<T> Parse.parse(String groupName, PCollection<String> input, PTypeFamily ptf, Extractor<T> extractor)
Parses the lines of the input PCollection<String> and returns a PCollection<T> using the given Extractor<T> that uses the given PTypeFamily.

Methods in org.apache.crunch.contrib.text with parameters of type PCollection

static <T> PCollection<T> Parse.parse(String groupName, PCollection<String> input, Extractor<T> extractor)
 Parses the lines of the input PCollection<String> and returns a PCollection<T> using the given Extractor<T>.

static <T> PCollection<T> Parse.parse(String groupName, PCollection<String> input, PTypeFamily ptf, Extractor<T> extractor)
 Parses the lines of the input PCollection<String> and returns a PCollection<T> using the given Extractor<T> that uses the given PTypeFamily.

static <K,V> PTable<K,V> Parse.parseTable(String groupName, PCollection<String> input, Extractor<Pair<K,V>> extractor)
 Parses the lines of the input PCollection<String> and returns a PTable<K, V> using the given Extractor<Pair<K, V>>.

static <K,V> PTable<K,V> Parse.parseTable(String groupName, PCollection<String> input, PTypeFamily ptf, Extractor<Pair<K,V>> extractor)
 Parses the lines of the input PCollection<String> and returns a PTable<K, V> using the given Extractor<Pair<K, V>> that uses the given PTypeFamily.

Uses of PCollection in org.apache.crunch.examples

Methods in org.apache.crunch.examples that return PCollection

PCollection<org.apache.hadoop.hbase.client.Put> WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase.

Methods in org.apache.crunch.examples that return PCollection
`PCollection<org.apache.hadoop.hbase.client.Put>`	`WordAggregationHBase.createPut(PTable<String,String> extractedText)` Create puts in order to insert them in hbase.

Uses of PCollection in org.apache.crunch.impl.dist

Methods in org.apache.crunch.impl.dist that return PCollection

<S> PCollection<S> DistributedPipeline.emptyPCollection(PType<S> ptype)


<S> PCollection<S> DistributedPipeline.read(Source<S> source)


PCollection<String> DistributedPipeline.readTextFile(String pathName)

Methods in org.apache.crunch.impl.dist with parameters of type PCollection

<T> ReadableSource<T> DistributedPipeline.getMaterializeSourceTarget(PCollection<T> pcollection)
 Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection.

void DistributedPipeline.write(PCollection<?> pcollection, Target target)


void DistributedPipeline.write(PCollection<?> pcollection, Target target, Target.WriteMode writeMode)


<T> void DistributedPipeline.writeTextFile(PCollection<T> pcollection, String pathName)

Uses of PCollection in org.apache.crunch.impl.dist.collect

Classes in org.apache.crunch.impl.dist.collect that implement PCollection

class BaseDoCollection<S>


class BaseDoTable<K,V>


class BaseGroupedTable<K,V>


class BaseInputCollection<S>


class BaseInputTable<K,V>


class BaseUnionCollection<S>


class BaseUnionTable<K,V>


class EmptyPCollection<T>


class EmptyPTable<K,V>


class PCollectionImpl<S>


class PTableBase<K,V>

Classes in org.apache.crunch.impl.dist.collect that implement PCollection
`class`	`BaseDoCollection<S>`
`class`	`BaseDoTable<K,V>`
`class`	`BaseGroupedTable<K,V>`
`class`	`BaseInputCollection<S>`
`class`	`BaseInputTable<K,V>`
`class`	`BaseUnionCollection<S>`
`class`	`BaseUnionTable<K,V>`
`class`	`EmptyPCollection<T>`
`class`	`EmptyPTable<K,V>`
`class`	`PCollectionImpl<S>`
`class`	`PTableBase<K,V>`

Methods in org.apache.crunch.impl.dist.collect that return PCollection

PCollection<S> PCollectionImpl.aggregate(Aggregator<S> aggregator)


PCollection<S> PCollectionImpl.cache()


PCollection<S> PCollectionImpl.cache(CachingOptions options)


PCollection<S> PCollectionImpl.filter(FilterFn<S> filterFn)


PCollection<S> PCollectionImpl.filter(String name, FilterFn<S> filterFn)


PCollection<K> PTableBase.keys()


<T> PCollection<T> PCollectionImpl.parallelDo(DoFn<S,T> fn, PType<T> type)


<T> PCollection<T> PCollectionImpl.parallelDo(String name, DoFn<S,T> fn, PType<T> type)


<T> PCollection<T> PCollectionImpl.parallelDo(String name, DoFn<S,T> fn, PType<T> type, ParallelDoOptions options)


PCollection<S> PCollectionImpl.union(PCollection<S>... collections)


PCollection<S> PCollectionImpl.union(PCollection<S> other)


PCollection<V> PTableBase.values()


PCollection<S> PCollectionImpl.write(Target target)


PCollection<S> PCollectionImpl.write(Target target, Target.WriteMode writeMode)

Methods in org.apache.crunch.impl.dist.collect with parameters of type PCollection

PCollection<S> PCollectionImpl.union(PCollection<S>... collections)

PCollection<S> PCollectionImpl.union(PCollection<S> other)

Methods in org.apache.crunch.impl.dist.collect with parameters of type PCollection
`PCollection<S>`	`PCollectionImpl.union(PCollection<S>... collections)`
`PCollection<S>`	`PCollectionImpl.union(PCollection<S> other)`

Uses of PCollection in org.apache.crunch.impl.mem

Methods in org.apache.crunch.impl.mem that return PCollection

static <T> PCollection<T> MemPipeline.collectionOf(Iterable<T> collect)


static <T> PCollection<T> MemPipeline.collectionOf(T... ts)


<T> PCollection<T> MemPipeline.emptyPCollection(PType<T> ptype)


<T> PCollection<T> MemPipeline.read(Source<T> source)


PCollection<String> MemPipeline.readTextFile(String pathName)


static <T> PCollection<T> MemPipeline.typedCollectionOf(PType<T> ptype, Iterable<T> collect)


static <T> PCollection<T> MemPipeline.typedCollectionOf(PType<T> ptype, T... ts)

Methods in org.apache.crunch.impl.mem with parameters of type PCollection

<T> void MemPipeline.cache(PCollection<T> pcollection, CachingOptions options)


<T> Iterable<T> MemPipeline.materialize(PCollection<T> pcollection)


void MemPipeline.write(PCollection<?> collection, Target target)


void MemPipeline.write(PCollection<?> collection, Target target, Target.WriteMode writeMode)


<T> void MemPipeline.writeTextFile(PCollection<T> collection, String pathName)

Uses of PCollection in org.apache.crunch.impl.mr

Methods in org.apache.crunch.impl.mr with parameters of type PCollection

<T> void MRPipeline.cache(PCollection<T> pcollection, CachingOptions options)

<T> Iterable<T> MRPipeline.materialize(PCollection<T> pcollection)

Uses of PCollection in org.apache.crunch.impl.spark

Methods in org.apache.crunch.impl.spark that return PCollection

<S> PCollection<S> SparkPipeline.emptyPCollection(PType<S> ptype)

Methods in org.apache.crunch.impl.spark with parameters of type PCollection

<T> void SparkPipeline.cache(PCollection<T> pcollection, CachingOptions options)


org.apache.spark.storage.StorageLevel SparkRuntime.getStorageLevel(PCollection<?> pcollection)


<T> Iterable<T> SparkPipeline.materialize(PCollection<T> pcollection)

Constructor parameters in org.apache.crunch.impl.spark with type arguments of type PCollection

SparkRuntime(SparkPipeline pipeline, org.apache.spark.api.java.JavaSparkContext sparkContext, org.apache.hadoop.conf.Configuration conf, Map<PCollectionImpl<?>,Set<Target>> outputTargets, Map<PCollectionImpl<?>,org.apache.crunch.materialize.MaterializableIterable> toMaterialize, Map<PCollection<?>,org.apache.spark.storage.StorageLevel> toCache)

Uses of PCollection in org.apache.crunch.impl.spark.collect

Classes in org.apache.crunch.impl.spark.collect that implement PCollection

class DoCollection<S>


class DoTable<K,V>


class InputCollection<S>


class InputTable<K,V>


class PGroupedTableImpl<K,V>


class UnionCollection<S>


class UnionTable<K,V>

Uses of PCollection in org.apache.crunch.lib

Methods in org.apache.crunch.lib that return PCollection

static <S> PCollection<S> Aggregate.aggregate(PCollection<S> collect, Aggregator<S> aggregator)


static <T> PCollection<Tuple3<T,T,T>> Set.comm(PCollection<T> coll1, PCollection<T> coll2)
 Find the elements that are common to two sets, like the Unix comm utility.

static <U,V> PCollection<Pair<U,V>> Cartesian.cross(PCollection left, PCollection<V> right)
 Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).

static <U,V> PCollection<Pair<U,V>> Cartesian.cross(PCollection left, PCollection<V> right, int parallelism)
 Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).

static <T> PCollection<T> Set.difference(PCollection<T> coll1, PCollection<T> coll2)
 Compute the set difference between two sets of elements.

static <S> PCollection<S> Distinct.distinct(PCollection<S> input)
 Construct a new PCollection that contains the unique elements of a given input PCollection.

static <S> PCollection<S> Distinct.distinct(PCollection<S> input, int flushEvery)
 A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.

static <T,N extends Number> PCollection<Pair<Integer,T>> Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input, int[] sampleSizes)
 The most general purpose of the weighted reservoir sampling patterns that allows us to choose a random sample of elements for each of N input groups.

static <T,N extends Number> PCollection<Pair<Integer,T>> Sample.groupedWeightedReservoirSample(PTable<Integer,Pair<T,N>> input, int[] sampleSizes, Long seed)
 Same as the other groupedWeightedReservoirSample method, but include a seed for testing purposes.

static <T> PCollection<T> Set.intersection(PCollection<T> coll1, PCollection<T> coll2)
 Compute the intersection of two sets of elements.

static <K,V> PCollection<K> PTables.keys(PTable<K,V> ptable)
 Extract the keys from the given PTable<K, V> as a PCollection<K>.

static <T> PCollection<T> Sample.reservoirSample(PCollection<T> input, int sampleSize)
 Select a fixed number of elements from the given PCollection with each element equally likely to be included in the sample.

static <T> PCollection<T> Sample.reservoirSample(PCollection<T> input, int sampleSize, Long seed)
 A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes.

static <S> PCollection<S> Sample.sample(PCollection<S> input, double probability)
 Output records from the given PCollection with the given probability.

static <S> PCollection<S> Sample.sample(PCollection<S> input, Long seed, double probability)
 Output records from the given PCollection using a given seed.

static <T> PCollection<T> Shard.shard(PCollection<T> pc, int numPartitions)
 Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files.

static <T> PCollection<T> Sort.sort(PCollection<T> collection)
 Sorts the PCollection using the natural ordering of its elements in ascending order.

static <T> PCollection<T> Sort.sort(PCollection<T> collection, int numReducers, Sort.Order order)
 Sorts the PCollection using the natural ordering of its elements in the order specified using the given number of reducers.

static <T> PCollection<T> Sort.sort(PCollection<T> collection, Sort.Order order)
 Sorts the PCollection using the natural order of its elements with the given Order.

static <K,V1,V2,T> PCollection<T> SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype)
 Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>.

static <K,V1,V2,T> PCollection<T> SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype, int numReducers)
 Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>, using the given number of reducers.

static <U,V> PCollection<Pair<U,V>> Sort.sortPairs(PCollection<Pair<U,V>> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of Pairs using the specified column ordering.

static <V1,V2,V3,V4> PCollection<Tuple4<V1,V2,V3,V4>> Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of Tuple4s using the specified column ordering.

static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>> Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of Tuple3s using the specified column ordering.

static <T extends Tuple> PCollection<T> Sort.sortTuples(PCollection<T> collection, int numReducers, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of TupleNs using the specified column ordering and a client-specified number of reducers.

static <T extends Tuple> PCollection<T> Sort.sortTuples(PCollection<T> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of tuples using the specified column ordering.

static <K,V> PCollection<V> PTables.values(PTable<K,V> ptable)
 Extract the values from the given PTable<K, V> as a PCollection<V>.

static <T,N extends Number> PCollection<T> Sample.weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize)
 Selects a weighted sample of the elements of the given PCollection, where the second term in the input Pair is a numerical weight.

static <T,N extends Number> PCollection<T> Sample.weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize, Long seed)
 The weighted reservoir sampling function with the seed term exposed for testing purposes.

Methods in org.apache.crunch.lib that return types with arguments of type PCollection

static <T,U> Pair<PCollection<T>,PCollection> Channels.split(PCollection<Pair<T,U>> pCollection)
 Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

static <T,U> Pair<PCollection<T>,PCollection> Channels.split(PCollection<Pair<T,U>> pCollection)
 Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

static <T,U> Pair<PCollection<T>,PCollection> Channels.split(PCollection<Pair<T,U>> pCollection, PType<T> firstPType, PType secondPType)
 Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

static <T,U> Pair<PCollection<T>,PCollection> Channels.split(PCollection<Pair<T,U>> pCollection, PType<T> firstPType, PType secondPType)
 Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

Methods in org.apache.crunch.lib with parameters of type PCollection

static <S> PCollection<S> Aggregate.aggregate(PCollection<S> collect, Aggregator<S> aggregator)


static <K,V> PTable<K,V> PTables.asPTable(PCollection<Pair<K,V>> pcollect)
 Convert the given PCollection<Pair<K, V>> to a PTable<K, V>.

static <T> PCollection<Tuple3<T,T,T>> Set.comm(PCollection<T> coll1, PCollection<T> coll2)
 Find the elements that are common to two sets, like the Unix comm utility.

static <T> PCollection<Tuple3<T,T,T>> Set.comm(PCollection<T> coll1, PCollection<T> coll2)
 Find the elements that are common to two sets, like the Unix comm utility.

static <S> PTable<S,Long> Aggregate.count(PCollection<S> collect)
 Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.

static <S> PTable<S,Long> Aggregate.count(PCollection<S> collect, int numPartitions)
 Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.

static <U,V> PCollection<Pair<U,V>> Cartesian.cross(PCollection left, PCollection<V> right)
 Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).

static <U,V> PCollection<Pair<U,V>> Cartesian.cross(PCollection left, PCollection<V> right)
 Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).

static <U,V> PCollection<Pair<U,V>> Cartesian.cross(PCollection left, PCollection<V> right, int parallelism)
 Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).

static <U,V> PCollection<Pair<U,V>> Cartesian.cross(PCollection left, PCollection<V> right, int parallelism)
 Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).

static <T> PCollection<T> Set.difference(PCollection<T> coll1, PCollection<T> coll2)
 Compute the set difference between two sets of elements.

static <T> PCollection<T> Set.difference(PCollection<T> coll1, PCollection<T> coll2)
 Compute the set difference between two sets of elements.

static <S> PCollection<S> Distinct.distinct(PCollection<S> input)
 Construct a new PCollection that contains the unique elements of a given input PCollection.

static <S> PCollection<S> Distinct.distinct(PCollection<S> input, int flushEvery)
 A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.

static <T> PCollection<T> Set.intersection(PCollection<T> coll1, PCollection<T> coll2)
 Compute the intersection of two sets of elements.

static <T> PCollection<T> Set.intersection(PCollection<T> coll1, PCollection<T> coll2)
 Compute the intersection of two sets of elements.

static <S> PObject<Long> Aggregate.length(PCollection<S> collect)
 Returns the number of elements in the provided PCollection.

static <S> PObject<S> Aggregate.max(PCollection<S> collect)
 Returns the largest numerical element from the input collection.

static <S> PObject<S> Aggregate.min(PCollection<S> collect)
 Returns the smallest numerical element from the input collection.

static <T> PCollection<T> Sample.reservoirSample(PCollection<T> input, int sampleSize)
 Select a fixed number of elements from the given PCollection with each element equally likely to be included in the sample.

static <T> PCollection<T> Sample.reservoirSample(PCollection<T> input, int sampleSize, Long seed)
 A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes.

static <S> PCollection<S> Sample.sample(PCollection<S> input, double probability)
 Output records from the given PCollection with the given probability.

static <S> PCollection<S> Sample.sample(PCollection<S> input, Long seed, double probability)
 Output records from the given PCollection using a given seed.

static <T> PCollection<T> Shard.shard(PCollection<T> pc, int numPartitions)
 Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files.

static <T> PCollection<T> Sort.sort(PCollection<T> collection)
 Sorts the PCollection using the natural ordering of its elements in ascending order.

static <T> PCollection<T> Sort.sort(PCollection<T> collection, int numReducers, Sort.Order order)
 Sorts the PCollection using the natural ordering of its elements in the order specified using the given number of reducers.

static <T> PCollection<T> Sort.sort(PCollection<T> collection, Sort.Order order)
 Sorts the PCollection using the natural order of its elements with the given Order.

static <U,V> PCollection<Pair<U,V>> Sort.sortPairs(PCollection<Pair<U,V>> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of Pairs using the specified column ordering.

static <V1,V2,V3,V4> PCollection<Tuple4<V1,V2,V3,V4>> Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of Tuple4s using the specified column ordering.

static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>> Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of Tuple3s using the specified column ordering.

static <T extends Tuple> PCollection<T> Sort.sortTuples(PCollection<T> collection, int numReducers, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of TupleNs using the specified column ordering and a client-specified number of reducers.

static <T extends Tuple> PCollection<T> Sort.sortTuples(PCollection<T> collection, Sort.ColumnOrder... columnOrders)
 Sorts the PCollection of tuples using the specified column ordering.

static <T,U> Pair<PCollection<T>,PCollection> Channels.split(PCollection<Pair<T,U>> pCollection)
 Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

static <T,U> Pair<PCollection<T>,PCollection> Channels.split(PCollection<Pair<T,U>> pCollection, PType<T> firstPType, PType secondPType)
 Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.

static <T,N extends Number> PCollection<T> Sample.weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize)
 Selects a weighted sample of the elements of the given PCollection, where the second term in the input Pair is a numerical weight.

static <T,N extends Number> PCollection<T> Sample.weightedReservoirSample(PCollection<Pair<T,N>> input, int sampleSize, Long seed)
 The weighted reservoir sampling function with the seed term exposed for testing purposes.

Uses of PCollection in org.apache.crunch.lib.join

Methods in org.apache.crunch.lib.join that return PCollection

static <K,U,V,T> PCollection<T> OneToManyJoin.oneToManyJoin(PTable<K,U> left, PTable<K,V> right, DoFn<Pair<U,Iterable<V>>,T> postProcessFn, PType<T> ptype)
Performs a join on two tables, where the left table only contains a single value per key.

static <K,U,V,T> PCollection<T> OneToManyJoin.oneToManyJoin(PTable<K,U> left, PTable<K,V> right, DoFn<Pair<U,Iterable<V>>,T> postProcessFn, PType<T> ptype, int numReducers)
Supports a user-specified number of reducers for the one-to-many join.

Uses of PCollection in org.apache.crunch.util

Methods in org.apache.crunch.util that return PCollection

<T> PCollection<T> CrunchTool.read(Source<T> source)

PCollection<String> CrunchTool.readTextFile(String pathName)

Methods in org.apache.crunch.util with parameters of type PCollection

static <T> int PartitionUtils.getRecommendedPartitions(PCollection<T> pcollection)


static <T> int PartitionUtils.getRecommendedPartitions(PCollection<T> pcollection, org.apache.hadoop.conf.Configuration conf)


<T> Iterable<T> CrunchTool.materialize(PCollection<T> pcollection)


void CrunchTool.write(PCollection<?> pcollection, Target target)


void CrunchTool.writeTextFile(PCollection<?> pcollection, String pathName)