public abstract class PCollectionImpl<S> extends Object implements PCollection<S>
| Modifier and Type | Class and Description | 
|---|---|
| static interface  | PCollectionImpl.Visitor | 
| Constructor and Description | 
|---|
| PCollectionImpl(String name,
               DistributedPipeline pipeline) | 
| PCollectionImpl(String name,
               DistributedPipeline pipeline,
               ParallelDoOptions doOptions) | 
| Modifier and Type | Method and Description | 
|---|---|
| void | accept(PCollectionImpl.Visitor visitor) | 
| PCollection<S> | aggregate(Aggregator<S> aggregator)Returns a  PCollectionthat contains the result of aggregating all values in this instance. | 
| PObject<Collection<S>> | asCollection() | 
| ReadableData<S> | asReadable(boolean materialize) | 
| <K> PTable<K,S> | by(MapFn<S,K> mapFn,
  PType<K> keyType)Apply the given map function to each element of this instance in order to
 create a  PTable. | 
| <K> PTable<K,S> | by(String name,
  MapFn<S,K> mapFn,
  PType<K> keyType)Apply the given map function to each element of this instance in order to
 create a  PTable. | 
| PCollection<S> | cache()Marks this data as cached using the default  CachingOptions. | 
| PCollection<S> | cache(CachingOptions options)Marks this data as cached using the given  CachingOptions. | 
| PTable<S,Long> | count()Returns a  PTableinstance that contains the counts of each unique
 element of this PCollection. | 
| PCollection<S> | filter(FilterFn<S> filterFn)Apply the given filter function to this instance and return the resulting
  PCollection. | 
| PCollection<S> | filter(String name,
      FilterFn<S> filterFn)Apply the given filter function to this instance and return the resulting
  PCollection. | 
| PObject<S> | first() | 
| int | getDepth() | 
| abstract long | getLastModifiedAt()The time of the most recent modification to one of the input sources to the collection. | 
| SourceTarget<S> | getMaterializedAt() | 
| String | getName()Returns a shorthand name for this PCollection. | 
| PCollectionImpl<?> | getOnlyParent() | 
| ParallelDoOptions | getParallelDoOptions() | 
| abstract List<PCollectionImpl<?>> | getParents() | 
| DistributedPipeline | getPipeline()Returns the  Pipelineassociated with this PCollection. | 
| long | getSize()Returns the size of the data represented by this  PCollectionin
 bytes. | 
| Set<Target> | getTargetDependencies() | 
| PTypeFamily | getTypeFamily()Returns the  PTypeFamilyof thisPCollection. | 
| boolean | isBreakpoint() | 
| PObject<Long> | length()Returns the number of elements represented by this  PCollection. | 
| Iterable<S> | materialize()Returns a reference to the data set represented by this PCollection that
 may be used by the client to read the data locally. | 
| void | materializeAt(SourceTarget<S> sourceTarget) | 
| PObject<S> | max()Returns a  PObjectof the maximum element of this instance. | 
| PObject<S> | min()Returns a  PObjectof the minimum element of this instance. | 
| <K,V> PTable<K,V> | parallelDo(DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> type)Similar to the other  parallelDoinstance, but returns aPTableinstance instead of aPCollection. | 
| <T> PCollection<T> | parallelDo(DoFn<S,T> fn,
          PType<T> type)Applies the given doFn to the elements of this  PCollectionand
 returns a newPCollectionthat is the output of this processing. | 
| <K,V> PTable<K,V> | parallelDo(String name,
          DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> type)Similar to the other  parallelDoinstance, but returns aPTableinstance instead of aPCollection. | 
| <K,V> PTable<K,V> | parallelDo(String name,
          DoFn<S,Pair<K,V>> fn,
          PTableType<K,V> type,
          ParallelDoOptions options)Similar to the other  parallelDoinstance, but returns aPTableinstance instead of aPCollection. | 
| <T> PCollection<T> | parallelDo(String name,
          DoFn<S,T> fn,
          PType<T> type)Applies the given doFn to the elements of this  PCollectionand
 returns a newPCollectionthat is the output of this processing. | 
| <T> PCollection<T> | parallelDo(String name,
          DoFn<S,T> fn,
          PType<T> type,
          ParallelDoOptions options)Applies the given doFn to the elements of this  PCollectionand
 returns a newPCollectionthat is the output of this processing. | 
| <Output> Output | sequentialDo(String label,
            PipelineCallable<Output> pipelineCallable)Adds the materialized data in this  PCollectionas a dependency to the givenPipelineCallableand registers it with thePipelineassociated with this
 instance. | 
| void | setBreakpoint() | 
| String | toString() | 
| PCollection<S> | union(PCollection<S>... collections)Returns a  PCollectioninstance that acts as the union of thisPCollectionand the inputPCollections. | 
| PCollection<S> | union(PCollection<S> other)Returns a  PCollectioninstance that acts as the union of thisPCollectionand the givenPCollection. | 
| PCollection<S> | write(Target target)Write the contents of this  PCollectionto the givenTarget,
 using the storage format specified by the target. | 
| PCollection<S> | write(Target target,
     Target.WriteMode writeMode)Write the contents of this  PCollectionto the givenTarget,
 using the givenTarget.WriteModeto handle existing
 targets. | 
equals, getClass, hashCode, notify, notifyAll, wait, wait, waitgetPTypepublic PCollectionImpl(String name, DistributedPipeline pipeline)
public PCollectionImpl(String name, DistributedPipeline pipeline, ParallelDoOptions doOptions)
public String getName()
PCollectiongetName in interface PCollection<S>public DistributedPipeline getPipeline()
PCollectionPipeline associated with this PCollection.getPipeline in interface PCollection<S>public ParallelDoOptions getParallelDoOptions()
public Iterable<S> materialize()
PCollectionmaterialize in interface PCollection<S>public PCollection<S> cache()
PCollectionCachingOptions. Cached PCollections will only
 be processed once, and then their contents will be saved so that downstream code can process them many times.cache in interface PCollection<S>PCollection instancepublic PCollection<S> cache(CachingOptions options)
PCollectionCachingOptions. Cached PCollections will only
 be processed once and then their contents will be saved so that downstream code can process them many times.cache in interface PCollection<S>options - the options that control the cache settings for the dataPCollection instancepublic PCollection<S> union(PCollection<S> other)
PCollectionPCollection instance that acts as the union of this
 PCollection and the given PCollection.union in interface PCollection<S>public PCollection<S> union(PCollection<S>... collections)
PCollectionPCollection instance that acts as the union of this
 PCollection and the input PCollections.union in interface PCollection<S>public <T> PCollection<T> parallelDo(DoFn<S,T> fn, PType<T> type)
PCollectionPCollection and
 returns a new PCollection that is the output of this processing.parallelDo in interface PCollection<S>fn - The DoFn to applytype - The PType of the resulting PCollectionPCollectionpublic <T> PCollection<T> parallelDo(String name, DoFn<S,T> fn, PType<T> type)
PCollectionPCollection and
 returns a new PCollection that is the output of this processing.parallelDo in interface PCollection<S>name - An identifier for this processing step, useful for debuggingfn - The DoFn to applytype - The PType of the resulting PCollectionPCollectionpublic <T> PCollection<T> parallelDo(String name, DoFn<S,T> fn, PType<T> type, ParallelDoOptions options)
PCollectionPCollection and
 returns a new PCollection that is the output of this processing.parallelDo in interface PCollection<S>name - An identifier for this processing step, useful for debuggingfn - The DoFn to applytype - The PType of the resulting PCollectionoptions - Optional information that is needed for certain pipeline operationsPCollectionpublic <K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> fn, PTableType<K,V> type)
PCollectionparallelDo instance, but returns a
 PTable instance instead of a PCollection.parallelDo in interface PCollection<S>fn - The DoFn to applytype - The PTableType of the resulting PTablePTablepublic <K,V> PTable<K,V> parallelDo(String name, DoFn<S,Pair<K,V>> fn, PTableType<K,V> type)
PCollectionparallelDo instance, but returns a
 PTable instance instead of a PCollection.parallelDo in interface PCollection<S>name - An identifier for this processing stepfn - The DoFn to applytype - The PTableType of the resulting PTablePTablepublic <K,V> PTable<K,V> parallelDo(String name, DoFn<S,Pair<K,V>> fn, PTableType<K,V> type, ParallelDoOptions options)
PCollectionparallelDo instance, but returns a
 PTable instance instead of a PCollection.parallelDo in interface PCollection<S>name - An identifier for this processing stepfn - The DoFn to applytype - The PTableType of the resulting PTableoptions - Optional information that is needed for certain pipeline operationsPTablepublic PCollection<S> write(Target target)
PCollectionPCollection to the given Target,
 using the storage format specified by the target.write in interface PCollection<S>target - The target to write topublic PCollection<S> write(Target target, Target.WriteMode writeMode)
PCollectionPCollection to the given Target,
 using the given Target.WriteMode to handle existing
 targets.write in interface PCollection<S>target - The targetwriteMode - The rule for handling existing outputs at the target locationpublic void accept(PCollectionImpl.Visitor visitor)
public void setBreakpoint()
public boolean isBreakpoint()
public PObject<Collection<S>> asCollection()
asCollection in interface PCollection<S>PObject encapsulating an in-memory Collection containing the values
 of this PCollection.public PObject<S> first()
first in interface PCollection<S>PCollection.public <Output> Output sequentialDo(String label, PipelineCallable<Output> pipelineCallable)
PCollectionPCollection as a dependency to the given
 PipelineCallable and registers it with the Pipeline associated with this
 instance.sequentialDo in interface PCollection<S>label - the label to use inside of the PipelineCallable for referencing this PCollectionpipelineCallable - the function itselfgetOutput function on the given argument.public SourceTarget<S> getMaterializedAt()
public void materializeAt(SourceTarget<S> sourceTarget)
public PCollection<S> filter(FilterFn<S> filterFn)
PCollectionPCollection.filter in interface PCollection<S>public PCollection<S> filter(String name, FilterFn<S> filterFn)
PCollectionPCollection.filter in interface PCollection<S>name - An identifier for this processing stepfilterFn - The FilterFn to applypublic <K> PTable<K,S> by(MapFn<S,K> mapFn, PType<K> keyType)
PCollectionPTable.by in interface PCollection<S>public <K> PTable<K,S> by(String name, MapFn<S,K> mapFn, PType<K> keyType)
PCollectionPTable.by in interface PCollection<S>name - An identifier for this processing stepmapFn - The MapFn to applypublic PTable<S,Long> count()
PCollectionPTable instance that contains the counts of each unique
 element of this PCollection.count in interface PCollection<S>public PObject<Long> length()
PCollectionPCollection.length in interface PCollection<S>PObject containing the number of elements in this PCollection.public PObject<S> max()
PCollectionPObject of the maximum element of this instance.max in interface PCollection<S>public PObject<S> min()
PCollectionPObject of the minimum element of this instance.min in interface PCollection<S>public PCollection<S> aggregate(Aggregator<S> aggregator)
PCollectionPCollection that contains the result of aggregating all values in this instance.aggregate in interface PCollection<S>public PTypeFamily getTypeFamily()
PCollectionPTypeFamily of this PCollection.getTypeFamily in interface PCollection<S>public abstract List<PCollectionImpl<?>> getParents()
public PCollectionImpl<?> getOnlyParent()
public int getDepth()
public ReadableData<S> asReadable(boolean materialize)
asReadable in interface PCollection<S>materialize - If true, materialize this data before returning a reference to itpublic long getSize()
PCollectionPCollection in
 bytes.getSize in interface PCollection<S>public abstract long getLastModifiedAt()
-1 should be returned.Copyright © 2017 The Apache Software Foundation. All rights reserved.