public interface PCollection<S>
Modifier and Type | Method and Description |
---|---|
PObject<Collection<S>> |
asCollection() |
<K> PTable<K,S> |
by(MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
<K> PTable<K,S> |
by(String name,
MapFn<S,K> extractKeyFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
PTable<S,Long> |
count()
Returns a
PTable instance that contains the counts of each unique
element of this PCollection. |
PCollection<S> |
filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection . |
PCollection<S> |
filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection . |
String |
getName()
Returns a shorthand name for this PCollection.
|
Pipeline |
getPipeline()
Returns the
Pipeline associated with this PCollection. |
PType<S> |
getPType()
Returns the
PType of this PCollection . |
long |
getSize()
Returns the size of the data represented by this
PCollection in
bytes. |
PTypeFamily |
getTypeFamily()
Returns the
PTypeFamily of this PCollection . |
PObject<Long> |
length()
Returns the number of elements represented by this
PCollection . |
Iterable<S> |
materialize()
Returns a reference to the data set represented by this PCollection that
may be used by the client to read the data locally.
|
PObject<S> |
max()
Returns a
PObject of the maximum element of this instance. |
PObject<S> |
min()
Returns a
PObject of the minimum element of this instance. |
<K,V> PTable<K,V> |
parallelDo(DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<T> PCollection<T> |
parallelDo(DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<K,V> PTable<K,V> |
parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
parallelDo(String name,
DoFn<S,Pair<K,V>> doFn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<T> PCollection<T> |
parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
PCollection<S> |
union(PCollection<S>... collections)
Returns a
PCollection instance that acts as the union of this
PCollection and the input PCollection s. |
PCollection<S> |
write(Target target)
Write the contents of this
PCollection to the given Target ,
using the storage format specified by the target. |
PCollection<S> |
write(Target target,
Target.WriteMode writeMode)
Write the contents of this
PCollection to the given Target ,
using the given Target.WriteMode to handle existing
targets. |
Pipeline getPipeline()
Pipeline
associated with this PCollection.PCollection<S> union(PCollection<S>... collections)
PCollection
instance that acts as the union of this
PCollection
and the input PCollection
s.<T> PCollection<T> parallelDo(DoFn<S,T> doFn, PType<T> type)
PCollection
and
returns a new PCollection
that is the output of this processing.doFn
- The DoFn
to applytype
- The PType
of the resulting PCollection
PCollection
<T> PCollection<T> parallelDo(String name, DoFn<S,T> doFn, PType<T> type)
PCollection
and
returns a new PCollection
that is the output of this processing.name
- An identifier for this processing step, useful for debuggingdoFn
- The DoFn
to applytype
- The PType
of the resulting PCollection
PCollection
<T> PCollection<T> parallelDo(String name, DoFn<S,T> doFn, PType<T> type, ParallelDoOptions options)
PCollection
and
returns a new PCollection
that is the output of this processing.name
- An identifier for this processing step, useful for debuggingdoFn
- The DoFn
to applytype
- The PType
of the resulting PCollection
options
- Optional information that is needed for certain pipeline operationsPCollection
<K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type)
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.doFn
- The DoFn
to applytype
- The PTableType
of the resulting PTable
PTable
<K,V> PTable<K,V> parallelDo(String name, DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type)
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.name
- An identifier for this processing stepdoFn
- The DoFn
to applytype
- The PTableType
of the resulting PTable
PTable
<K,V> PTable<K,V> parallelDo(String name, DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type, ParallelDoOptions options)
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.name
- An identifier for this processing stepdoFn
- The DoFn
to applytype
- The PTableType
of the resulting PTable
options
- Optional information that is needed for certain pipeline operationsPTable
PCollection<S> write(Target target)
PCollection
to the given Target
,
using the storage format specified by the target.target
- The target to write toPCollection<S> write(Target target, Target.WriteMode writeMode)
PCollection
to the given Target
,
using the given Target.WriteMode
to handle existing
targets.target
- The targetwriteMode
- The rule for handling existing outputs at the target locationIterable<S> materialize()
PObject<Collection<S>> asCollection()
PObject
encapsulating an in-memory Collection
containing the values
of this PCollection
.PTypeFamily getTypeFamily()
PTypeFamily
of this PCollection
.long getSize()
PCollection
in
bytes.PObject<Long> length()
PCollection
.PObject
containing the number of elements in this PCollection
.String getName()
PCollection<S> filter(FilterFn<S> filterFn)
PCollection
.PCollection<S> filter(String name, FilterFn<S> filterFn)
PCollection
.name
- An identifier for this processing stepfilterFn
- The FilterFn
to apply<K> PTable<K,S> by(MapFn<S,K> extractKeyFn, PType<K> keyType)
PTable
.<K> PTable<K,S> by(String name, MapFn<S,K> extractKeyFn, PType<K> keyType)
PTable
.name
- An identifier for this processing stepextractKeyFn
- The MapFn
to applyPTable<S,Long> count()
PTable
instance that contains the counts of each unique
element of this PCollection.Copyright © 2013 The Apache Software Foundation. All Rights Reserved.