public abstract class DistributedPipeline extends Object implements Pipeline
| Constructor and Description | 
|---|
| DistributedPipeline(String name,
                   org.apache.hadoop.conf.Configuration conf,
                   PCollectionFactory factory)Instantiate with a custom name and configuration. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | cleanup(boolean force)Cleans up any artifacts created as a result of  runningthe pipeline. | 
| <K,V> PTable<K,V> | create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <K,V> PTable<K,V> | create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <S> PCollection<S> | create(Iterable<S> contents,
      PType<S> ptype)Creates a  PCollectioncontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <S> PCollection<S> | create(Iterable<S> contents,
      PType<S> ptype,
      CreateOptions options)Creates a  PCollectioncontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <T> SourceTarget<T> | createIntermediateOutput(PType<T> ptype) | 
| org.apache.hadoop.fs.Path | createTempPath() | 
| PipelineResult | done()Run any remaining jobs required to generate outputs and then clean up any
 intermediate data files that were created in this run or previous calls to
  run. | 
| <S> PCollection<S> | emptyPCollection(PType<S> ptype)Creates an empty  PCollectionof the givenPType. | 
| <K,V> PTable<K,V> | emptyPTable(PTableType<K,V> ptype)Creates an empty  PTableof the givenPTable Type. | 
| void | enableDebug()Turn on debug logging for jobs that are run from this pipeline. | 
| org.apache.hadoop.conf.Configuration | getConfiguration()Returns the  Configurationinstance associated with this pipeline. | 
| PCollectionFactory | getFactory() | 
| <T> ReadableSource<T> | getMaterializeSourceTarget(PCollection<T> pcollection)Retrieve a ReadableSourceTarget that provides access to the contents of a  PCollection. | 
| String | getName()Returns the name of this pipeline. | 
| int | getNextAnonymousStageId() | 
| <S> PCollection<S> | read(Source<S> source)Converts the given  Sourceinto aPCollectionthat is
 available to jobs run using thisPipelineinstance. | 
| <S> PCollection<S> | read(Source<S> source,
    String named)Converts the given  Sourceinto aPCollectionthat is
 available to jobs run using thisPipelineinstance. | 
| <K,V> PTable<K,V> | read(TableSource<K,V> source)A version of the read method for  TableSourceinstances that map toPTables. | 
| <K,V> PTable<K,V> | read(TableSource<K,V> source,
    String named)A version of the read method for  TableSourceinstances that map toPTables. | 
| PCollection<String> | readTextFile(String pathName)A convenience method for reading a text file. | 
| <Output> Output | sequentialDo(PipelineCallable<Output> pipelineCallable)Executes the given  PipelineCallableon the client after theTargetsthat the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps. | 
| void | setConfiguration(org.apache.hadoop.conf.Configuration conf)Set the  Configurationto use with this pipeline. | 
| <S> PCollection<S> | union(List<PCollection<S>> collections) | 
| <K,V> PTable<K,V> | unionTables(List<PTable<K,V>> tables) | 
| void | write(PCollection<?> pcollection,
     Target target)Write the given collection to the given target on the next pipeline run. | 
| void | write(PCollection<?> pcollection,
     Target target,
     Target.WriteMode writeMode)Write the contents of the  PCollectionto the givenTarget,
 using the storage format specified by the target and the givenWriteModefor cases where the referencedTargetalready exists. | 
| <T> void | writeTextFile(PCollection<T> pcollection,
             String pathName)A convenience method for writing a text file. | 
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcache, materialize, run, runAsyncpublic DistributedPipeline(String name, org.apache.hadoop.conf.Configuration conf, PCollectionFactory factory)
name - Display name of the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipelinepublic PCollectionFactory getFactory()
public org.apache.hadoop.conf.Configuration getConfiguration()
PipelineConfiguration instance associated with this pipeline.getConfiguration in interface Pipelinepublic void setConfiguration(org.apache.hadoop.conf.Configuration conf)
PipelineConfiguration to use with this pipeline.setConfiguration in interface Pipelinepublic PipelineResult done()
Pipelinerun.public <S> PCollection<S> union(List<PCollection<S>> collections)
public <K,V> PTable<K,V> unionTables(List<PTable<K,V>> tables)
unionTables in interface Pipelinepublic <Output> Output sequentialDo(PipelineCallable<Output> pipelineCallable)
PipelinePipelineCallable on the client after the Targets
 that the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps.sequentialDo in interface PipelineOutput - The return type of the PipelineCallablepipelineCallable - The sequential logic to executepublic <S> PCollection<S> read(Source<S> source)
PipelineSource into a PCollection that is
 available to jobs run using this Pipeline instance.public <S> PCollection<S> read(Source<S> source, String named)
PipelineSource into a PCollection that is
 available to jobs run using this Pipeline instance.public <K,V> PTable<K,V> read(TableSource<K,V> source)
PipelineTableSource instances that map to
 PTables.public <K,V> PTable<K,V> read(TableSource<K,V> source, String named)
PipelineTableSource instances that map to
 PTables.public PCollection<String> readTextFile(String pathName)
PipelinereadTextFile in interface Pipelinepublic void write(PCollection<?> pcollection, Target target)
PipelineWriteMode.DEFAULT rule for the given Target.public void write(PCollection<?> pcollection, Target target, Target.WriteMode writeMode)
PipelinePCollection to the given Target,
 using the storage format specified by the target and the given
 WriteMode for cases where the referenced Target
 already exists.public <S> PCollection<S> emptyPCollection(PType<S> ptype)
PipelinePCollection of the given PType.emptyPCollection in interface Pipelineptype - The PType of the empty PCollectionpublic <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
PipelinePTable of the given PTable Type.emptyPTable in interface Pipelineptype - The PTableType of the empty PTablepublic <S> PCollection<S> create(Iterable<S> contents, PType<S> ptype)
PipelinePCollection containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <S> PCollection<S> create(Iterable<S> contents, PType<S> ptype, CreateOptions options)
PipelinePCollection containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype)
PipelinePTable containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype, CreateOptions options)
PipelinePTable containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <T> ReadableSource<T> getMaterializeSourceTarget(PCollection<T> pcollection)
PCollection.
 This is primarily intended as a helper method to Pipeline.materialize(PCollection). The
 underlying data of the ReadableSourceTarget may not be actually present until the pipeline is
 run.pcollection - The collection for which the ReadableSourceTarget is to be retrievedIllegalArgumentException - If no ReadableSourceTarget can be retrieved for the given
           PCollectionpublic <T> SourceTarget<T> createIntermediateOutput(PType<T> ptype)
public org.apache.hadoop.fs.Path createTempPath()
public <T> void writeTextFile(PCollection<T> pcollection, String pathName)
PipelinewriteTextFile in interface Pipelinepublic void cleanup(boolean force)
Pipelinerunning the pipeline.public int getNextAnonymousStageId()
public void enableDebug()
PipelineenableDebug in interface PipelineCopyright © 2017 The Apache Software Foundation. All rights reserved.