|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface PCollection<S>
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
| Method Summary | ||
|---|---|---|
PCollection<S> |
aggregate(Aggregator<S> aggregator)
Returns a PCollection that contains the result of aggregating all values in this instance. |
|
PObject<Collection<S>> |
asCollection()
|
|
ReadableData<S> |
asReadable(boolean materialize)
|
|
|
by(MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to create a PTable. |
|
|
by(String name,
MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to create a PTable. |
|
PCollection<S> |
cache()
Marks this data as cached using the default CachingOptions. |
|
PCollection<S> |
cache(CachingOptions options)
Marks this data as cached using the given CachingOptions. |
|
PTable<S,Long> |
count()
Returns a PTable instance that contains the counts of each unique
element of this PCollection. |
|
PCollection<S> |
filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection. |
|
PCollection<S> |
filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection. |
|
PObject<S> |
first()
|
|
String |
getName()
Returns a shorthand name for this PCollection. |
|
Pipeline |
getPipeline()
Returns the Pipeline associated with this PCollection. |
|
PType<S> |
getPType()
Returns the PType of this PCollection. |
|
long |
getSize()
Returns the size of the data represented by this PCollection in
bytes. |
|
PTypeFamily |
getTypeFamily()
Returns the PTypeFamily of this PCollection. |
|
PObject<Long> |
length()
Returns the number of elements represented by this PCollection. |
|
Iterable<S> |
materialize()
Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally. |
|
PObject<S> |
max()
Returns a PObject of the maximum element of this instance. |
|
PObject<S> |
min()
Returns a PObject of the minimum element of this instance. |
|
|
parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other parallelDo instance, but returns a
PTable instance instead of a PCollection. |
|
|
parallelDo(DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other parallelDo instance, but returns a
PTable instance instead of a PCollection. |
|
|
parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other parallelDo instance, but returns a
PTable instance instead of a PCollection. |
|
|
parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
PCollection<S> |
union(PCollection<S>... collections)
Returns a PCollection instance that acts as the union of this
PCollection and the input PCollections. |
|
PCollection<S> |
union(PCollection<S> other)
Returns a PCollection instance that acts as the union of this
PCollection and the given PCollection. |
|
PCollection<S> |
write(Target target)
Write the contents of this PCollection to the given Target,
using the storage format specified by the target. |
|
PCollection<S> |
write(Target target,
Target.WriteMode writeMode)
Write the contents of this PCollection to the given Target,
using the given Target.WriteMode to handle existing
targets. |
|
| Method Detail |
|---|
Pipeline getPipeline()
Pipeline associated with this PCollection.
PCollection<S> union(PCollection<S> other)
PCollection instance that acts as the union of this
PCollection and the given PCollection.
PCollection<S> union(PCollection<S>... collections)
PCollection instance that acts as the union of this
PCollection and the input PCollections.
<T> PCollection<T> parallelDo(DoFn<S,T> doFn,
PType<T> type)
PCollection and
returns a new PCollection that is the output of this processing.
doFn - The DoFn to applytype - The PType of the resulting PCollection
PCollection
<T> PCollection<T> parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
PCollection and
returns a new PCollection that is the output of this processing.
name - An identifier for this processing step, useful for debuggingdoFn - The DoFn to applytype - The PType of the resulting PCollection
PCollection
<T> PCollection<T> parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
PCollection and
returns a new PCollection that is the output of this processing.
name - An identifier for this processing step, useful for debuggingdoFn - The DoFn to applytype - The PType of the resulting PCollectionoptions - Optional information that is needed for certain pipeline operations
PCollection
<K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
parallelDo instance, but returns a
PTable instance instead of a PCollection.
doFn - The DoFn to applytype - The PTableType of the resulting PTable
PTable
<K,V> PTable<K,V> parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
parallelDo instance, but returns a
PTable instance instead of a PCollection.
name - An identifier for this processing stepdoFn - The DoFn to applytype - The PTableType of the resulting PTable
PTable
<K,V> PTable<K,V> parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
parallelDo instance, but returns a
PTable instance instead of a PCollection.
name - An identifier for this processing stepdoFn - The DoFn to applytype - The PTableType of the resulting PTableoptions - Optional information that is needed for certain pipeline operations
PTablePCollection<S> write(Target target)
PCollection to the given Target,
using the storage format specified by the target.
target - The target to write to
PCollection<S> write(Target target,
Target.WriteMode writeMode)
PCollection to the given Target,
using the given Target.WriteMode to handle existing
targets.
target - The targetwriteMode - The rule for handling existing outputs at the target locationIterable<S> materialize()
PCollection<S> cache()
CachingOptions. Cached PCollections will only
be processed once, and then their contents will be saved so that downstream code can process them many times.
PCollection instancePCollection<S> cache(CachingOptions options)
CachingOptions. Cached PCollections will only
be processed once and then their contents will be saved so that downstream code can process them many times.
options - the options that control the cache settings for the data
PCollection instancePObject<Collection<S>> asCollection()
PObject encapsulating an in-memory Collection containing the values
of this PCollection.PObject<S> first()
PCollection.ReadableData<S> asReadable(boolean materialize)
materialize - If true, materialize this data before returning a reference to it
PType<S> getPType()
PType of this PCollection.
PTypeFamily getTypeFamily()
PTypeFamily of this PCollection.
long getSize()
PCollection in
bytes.
PObject<Long> length()
PCollection.
PObject containing the number of elements in this PCollection.String getName()
PCollection<S> filter(FilterFn<S> filterFn)
PCollection.
PCollection<S> filter(String name,
FilterFn<S> filterFn)
PCollection.
name - An identifier for this processing stepfilterFn - The FilterFn to apply
<K> PTable<K,S> by(MapFn<S,K> extractKeyFn,
PType<K> keyType)
PTable.
<K> PTable<K,S> by(String name,
MapFn<S,K> extractKeyFn,
PType<K> keyType)
PTable.
name - An identifier for this processing stepextractKeyFn - The MapFn to applyPTable<S,Long> count()
PTable instance that contains the counts of each unique
element of this PCollection.
PObject<S> max()
PObject of the maximum element of this instance.
PObject<S> min()
PObject of the minimum element of this instance.
PCollection<S> aggregate(Aggregator<S> aggregator)
PCollection that contains the result of aggregating all values in this instance.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||