public abstract class PCollectionImpl<S> extends Object implements PCollection<S>
Modifier and Type | Class and Description |
---|---|
static interface |
PCollectionImpl.Visitor |
Constructor and Description |
---|
PCollectionImpl(String name,
DistributedPipeline pipeline) |
PCollectionImpl(String name,
DistributedPipeline pipeline,
ParallelDoOptions doOptions) |
Modifier and Type | Method and Description |
---|---|
void |
accept(PCollectionImpl.Visitor visitor) |
PCollection<S> |
aggregate(Aggregator<S> aggregator)
Returns a
PCollection that contains the result of aggregating all values in this instance. |
PObject<Collection<S>> |
asCollection() |
ReadableData<S> |
asReadable(boolean materialize) |
<K> PTable<K,S> |
by(MapFn<S,K> mapFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
<K> PTable<K,S> |
by(String name,
MapFn<S,K> mapFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to
create a
PTable . |
PCollection<S> |
cache()
Marks this data as cached using the default
CachingOptions . |
PCollection<S> |
cache(CachingOptions options)
Marks this data as cached using the given
CachingOptions . |
PTable<S,Long> |
count()
Returns a
PTable instance that contains the counts of each unique
element of this PCollection. |
PCollection<S> |
filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection . |
PCollection<S> |
filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection . |
PObject<S> |
first() |
int |
getDepth() |
abstract long |
getLastModifiedAt()
The time of the most recent modification to one of the input sources to the collection.
|
SourceTarget<S> |
getMaterializedAt() |
String |
getName()
Returns a shorthand name for this PCollection.
|
PCollectionImpl<?> |
getOnlyParent() |
ParallelDoOptions |
getParallelDoOptions() |
abstract List<PCollectionImpl<?>> |
getParents() |
DistributedPipeline |
getPipeline()
Returns the
Pipeline associated with this PCollection. |
long |
getSize()
Returns the size of the data represented by this
PCollection in
bytes. |
Set<Target> |
getTargetDependencies() |
PTypeFamily |
getTypeFamily()
Returns the
PTypeFamily of this PCollection . |
boolean |
isBreakpoint() |
PObject<Long> |
length()
Returns the number of elements represented by this
PCollection . |
Iterable<S> |
materialize()
Returns a reference to the data set represented by this PCollection that
may be used by the client to read the data locally.
|
void |
materializeAt(SourceTarget<S> sourceTarget) |
PObject<S> |
max()
Returns a
PObject of the maximum element of this instance. |
PObject<S> |
min()
Returns a
PObject of the minimum element of this instance. |
<K,V> PTable<K,V> |
parallelDo(DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<T> PCollection<T> |
parallelDo(DoFn<S,T> fn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<K,V> PTable<K,V> |
parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<K,V> PTable<K,V> |
parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type,
ParallelDoOptions options)
Similar to the other
parallelDo instance, but returns a
PTable instance instead of a PCollection . |
<T> PCollection<T> |
parallelDo(String name,
DoFn<S,T> fn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
parallelDo(String name,
DoFn<S,T> fn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<Output> Output |
sequentialDo(String label,
PipelineCallable<Output> pipelineCallable)
Adds the materialized data in this
PCollection as a dependency to the given
PipelineCallable and registers it with the Pipeline associated with this
instance. |
void |
setBreakpoint() |
String |
toString() |
PCollection<S> |
union(PCollection<S>... collections)
Returns a
PCollection instance that acts as the union of this
PCollection and the input PCollection s. |
PCollection<S> |
union(PCollection<S> other)
Returns a
PCollection instance that acts as the union of this
PCollection and the given PCollection . |
PCollection<S> |
write(Target target)
Write the contents of this
PCollection to the given Target ,
using the storage format specified by the target. |
PCollection<S> |
write(Target target,
Target.WriteMode writeMode)
Write the contents of this
PCollection to the given Target ,
using the given Target.WriteMode to handle existing
targets. |
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
getPType
public PCollectionImpl(String name, DistributedPipeline pipeline)
public PCollectionImpl(String name, DistributedPipeline pipeline, ParallelDoOptions doOptions)
public String getName()
PCollection
getName
in interface PCollection<S>
public DistributedPipeline getPipeline()
PCollection
Pipeline
associated with this PCollection.getPipeline
in interface PCollection<S>
public ParallelDoOptions getParallelDoOptions()
public Iterable<S> materialize()
PCollection
materialize
in interface PCollection<S>
public PCollection<S> cache()
PCollection
CachingOptions
. Cached PCollection
s will only
be processed once, and then their contents will be saved so that downstream code can process them many times.cache
in interface PCollection<S>
PCollection
instancepublic PCollection<S> cache(CachingOptions options)
PCollection
CachingOptions
. Cached PCollection
s will only
be processed once and then their contents will be saved so that downstream code can process them many times.cache
in interface PCollection<S>
options
- the options that control the cache settings for the dataPCollection
instancepublic PCollection<S> union(PCollection<S> other)
PCollection
PCollection
instance that acts as the union of this
PCollection
and the given PCollection
.union
in interface PCollection<S>
public PCollection<S> union(PCollection<S>... collections)
PCollection
PCollection
instance that acts as the union of this
PCollection
and the input PCollection
s.union
in interface PCollection<S>
public <T> PCollection<T> parallelDo(DoFn<S,T> fn, PType<T> type)
PCollection
PCollection
and
returns a new PCollection
that is the output of this processing.parallelDo
in interface PCollection<S>
fn
- The DoFn
to applytype
- The PType
of the resulting PCollection
PCollection
public <T> PCollection<T> parallelDo(String name, DoFn<S,T> fn, PType<T> type)
PCollection
PCollection
and
returns a new PCollection
that is the output of this processing.parallelDo
in interface PCollection<S>
name
- An identifier for this processing step, useful for debuggingfn
- The DoFn
to applytype
- The PType
of the resulting PCollection
PCollection
public <T> PCollection<T> parallelDo(String name, DoFn<S,T> fn, PType<T> type, ParallelDoOptions options)
PCollection
PCollection
and
returns a new PCollection
that is the output of this processing.parallelDo
in interface PCollection<S>
name
- An identifier for this processing step, useful for debuggingfn
- The DoFn
to applytype
- The PType
of the resulting PCollection
options
- Optional information that is needed for certain pipeline operationsPCollection
public <K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> fn, PTableType<K,V> type)
PCollection
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.parallelDo
in interface PCollection<S>
fn
- The DoFn
to applytype
- The PTableType
of the resulting PTable
PTable
public <K,V> PTable<K,V> parallelDo(String name, DoFn<S,Pair<K,V>> fn, PTableType<K,V> type)
PCollection
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.parallelDo
in interface PCollection<S>
name
- An identifier for this processing stepfn
- The DoFn
to applytype
- The PTableType
of the resulting PTable
PTable
public <K,V> PTable<K,V> parallelDo(String name, DoFn<S,Pair<K,V>> fn, PTableType<K,V> type, ParallelDoOptions options)
PCollection
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.parallelDo
in interface PCollection<S>
name
- An identifier for this processing stepfn
- The DoFn
to applytype
- The PTableType
of the resulting PTable
options
- Optional information that is needed for certain pipeline operationsPTable
public PCollection<S> write(Target target)
PCollection
PCollection
to the given Target
,
using the storage format specified by the target.write
in interface PCollection<S>
target
- The target to write topublic PCollection<S> write(Target target, Target.WriteMode writeMode)
PCollection
PCollection
to the given Target
,
using the given Target.WriteMode
to handle existing
targets.write
in interface PCollection<S>
target
- The targetwriteMode
- The rule for handling existing outputs at the target locationpublic void accept(PCollectionImpl.Visitor visitor)
public void setBreakpoint()
public boolean isBreakpoint()
public PObject<Collection<S>> asCollection()
asCollection
in interface PCollection<S>
PObject
encapsulating an in-memory Collection
containing the values
of this PCollection
.public PObject<S> first()
first
in interface PCollection<S>
PCollection
.public <Output> Output sequentialDo(String label, PipelineCallable<Output> pipelineCallable)
PCollection
PCollection
as a dependency to the given
PipelineCallable
and registers it with the Pipeline
associated with this
instance.sequentialDo
in interface PCollection<S>
label
- the label to use inside of the PipelineCallable for referencing this PCollectionpipelineCallable
- the function itselfgetOutput
function on the given argument.public SourceTarget<S> getMaterializedAt()
public void materializeAt(SourceTarget<S> sourceTarget)
public PCollection<S> filter(FilterFn<S> filterFn)
PCollection
PCollection
.filter
in interface PCollection<S>
public PCollection<S> filter(String name, FilterFn<S> filterFn)
PCollection
PCollection
.filter
in interface PCollection<S>
name
- An identifier for this processing stepfilterFn
- The FilterFn
to applypublic <K> PTable<K,S> by(MapFn<S,K> mapFn, PType<K> keyType)
PCollection
PTable
.by
in interface PCollection<S>
public <K> PTable<K,S> by(String name, MapFn<S,K> mapFn, PType<K> keyType)
PCollection
PTable
.by
in interface PCollection<S>
name
- An identifier for this processing stepmapFn
- The MapFn
to applypublic PTable<S,Long> count()
PCollection
PTable
instance that contains the counts of each unique
element of this PCollection.count
in interface PCollection<S>
public PObject<Long> length()
PCollection
PCollection
.length
in interface PCollection<S>
PObject
containing the number of elements in this PCollection
.public PObject<S> max()
PCollection
PObject
of the maximum element of this instance.max
in interface PCollection<S>
public PObject<S> min()
PCollection
PObject
of the minimum element of this instance.min
in interface PCollection<S>
public PCollection<S> aggregate(Aggregator<S> aggregator)
PCollection
PCollection
that contains the result of aggregating all values in this instance.aggregate
in interface PCollection<S>
public PTypeFamily getTypeFamily()
PCollection
PTypeFamily
of this PCollection
.getTypeFamily
in interface PCollection<S>
public abstract List<PCollectionImpl<?>> getParents()
public PCollectionImpl<?> getOnlyParent()
public int getDepth()
public ReadableData<S> asReadable(boolean materialize)
asReadable
in interface PCollection<S>
materialize
- If true, materialize this data before returning a reference to itpublic long getSize()
PCollection
PCollection
in
bytes.getSize
in interface PCollection<S>
public abstract long getLastModifiedAt()
-1
should be returned.Copyright © 2016 The Apache Software Foundation. All rights reserved.