public interface Pipeline
| Modifier and Type | Method and Description | 
|---|---|
| <T> void | cache(PCollection<T> pcollection,
     CachingOptions options)Caches the given PCollection so that it will be processed at most once
 during pipeline execution. | 
| void | cleanup(boolean force)Cleans up any artifacts created as a result of  runningthe pipeline. | 
| <K,V> PTable<K,V> | create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <K,V> PTable<K,V> | create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <T> PCollection<T> | create(Iterable<T> contents,
      PType<T> ptype)Creates a  PCollectioncontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <T> PCollection<T> | create(Iterable<T> contents,
      PType<T> ptype,
      CreateOptions options)Creates a  PCollectioncontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| PipelineResult | done()Run any remaining jobs required to generate outputs and then clean up any
 intermediate data files that were created in this run or previous calls to
  run. | 
| <T> PCollection<T> | emptyPCollection(PType<T> ptype)Creates an empty  PCollectionof the givenPType. | 
| <K,V> PTable<K,V> | emptyPTable(PTableType<K,V> ptype)Creates an empty  PTableof the givenPTable Type. | 
| void | enableDebug()Turn on debug logging for jobs that are run from this pipeline. | 
| org.apache.hadoop.conf.Configuration | getConfiguration()Returns the  Configurationinstance associated with this pipeline. | 
| String | getName()Returns the name of this pipeline. | 
| <T> Iterable<T> | materialize(PCollection<T> pcollection)Create the given PCollection and read the data it contains into the
 returned Collection instance for client use. | 
| <T> PCollection<T> | read(Source<T> source)Converts the given  Sourceinto aPCollectionthat is
 available to jobs run using thisPipelineinstance. | 
| <T> PCollection<T> | read(Source<T> source,
    String named)Converts the given  Sourceinto aPCollectionthat is
 available to jobs run using thisPipelineinstance. | 
| <K,V> PTable<K,V> | read(TableSource<K,V> tableSource)A version of the read method for  TableSourceinstances that map toPTables. | 
| <K,V> PTable<K,V> | read(TableSource<K,V> tableSource,
    String named)A version of the read method for  TableSourceinstances that map toPTables. | 
| PCollection<String> | readTextFile(String pathName)A convenience method for reading a text file. | 
| PipelineResult | run()Constructs and executes a series of MapReduce jobs in order to write data
 to the output targets. | 
| PipelineExecution | runAsync()Constructs and starts a series of MapReduce jobs in order ot write data to
 the output targets, but returns a  ListenableFutureto allow clients to control
 job execution. | 
| <Output> Output | sequentialDo(PipelineCallable<Output> pipelineCallable)Executes the given  PipelineCallableon the client after theTargetsthat the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps. | 
| void | setConfiguration(org.apache.hadoop.conf.Configuration conf)Set the  Configurationto use with this pipeline. | 
| <S> PCollection<S> | union(List<PCollection<S>> collections) | 
| <K,V> PTable<K,V> | unionTables(List<PTable<K,V>> tables) | 
| void | write(PCollection<?> collection,
     Target target)Write the given collection to the given target on the next pipeline run. | 
| void | write(PCollection<?> collection,
     Target target,
     Target.WriteMode writeMode)Write the contents of the  PCollectionto the givenTarget,
 using the storage format specified by the target and the givenWriteModefor cases where the referencedTargetalready exists. | 
| <T> void | writeTextFile(PCollection<T> collection,
             String pathName)A convenience method for writing a text file. | 
void setConfiguration(org.apache.hadoop.conf.Configuration conf)
Configuration to use with this pipeline.String getName()
org.apache.hadoop.conf.Configuration getConfiguration()
Configuration instance associated with this pipeline.<T> PCollection<T> read(Source<T> source)
Source into a PCollection that is
 available to jobs run using this Pipeline instance.source - The source of data<T> PCollection<T> read(Source<T> source, String named)
Source into a PCollection that is
 available to jobs run using this Pipeline instance.source - The source of datanamed - A name for the returned PCollection<K,V> PTable<K,V> read(TableSource<K,V> tableSource)
TableSource instances that map to
 PTables.tableSource - The source of the data<K,V> PTable<K,V> read(TableSource<K,V> tableSource, String named)
TableSource instances that map to
 PTables.tableSource - The source of the datanamed - A name for the returned PTablevoid write(PCollection<?> collection, Target target)
WriteMode.DEFAULT rule for the given Target.collection - The collectiontarget - The output targetvoid write(PCollection<?> collection, Target target, Target.WriteMode writeMode)
PCollection to the given Target,
 using the storage format specified by the target and the given
 WriteMode for cases where the referenced Target
 already exists.collection - The collectiontarget - The target to write towriteMode - The strategy to use for handling existing outputs<T> Iterable<T> materialize(PCollection<T> pcollection)
pcollection - The PCollection to materialize<T> void cache(PCollection<T> pcollection, CachingOptions options)
pcollection - The PCollection to cacheoptions - The options for how the cached data is stored<T> PCollection<T> emptyPCollection(PType<T> ptype)
PCollection of the given PType.ptype - The PType of the empty PCollection<K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
PTable of the given PTable Type.ptype - The PTableType of the empty PTable<T> PCollection<T> create(Iterable<T> contents, PType<T> ptype)
PCollection containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.contents - The values the new PCollection will containptype - The PType of the PCollection<T> PCollection<T> create(Iterable<T> contents, PType<T> ptype, CreateOptions options)
PCollection containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.contents - The values the new PCollection will containptype - The PType of the PCollectionoptions - Additional options, such as the name or desired parallelism of the PCollection<K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype)
PTable containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.contents - The values the new PTable will containptype - The PTableType of the PTable<K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype, CreateOptions options)
PTable containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.contents - The values the new PTable will containptype - The PTableType of the PTableoptions - Additional options, such as the name or desired parallelism of the PTable<S> PCollection<S> union(List<PCollection<S>> collections)
<Output> Output sequentialDo(PipelineCallable<Output> pipelineCallable)
PipelineCallable on the client after the Targets
 that the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps.Output - The return type of the PipelineCallablepipelineCallable - The sequential logic to executePipelineResult run()
PipelineExecution runAsync()
ListenableFuture to allow clients to control
 job execution.PipelineResult done()
run.void cleanup(boolean force)
running the pipeline.force - forces the cleanup even if all targets of the pipeline have not been completed.PCollection<String> readTextFile(String pathName)
<T> void writeTextFile(PCollection<T> collection, String pathName)
void enableDebug()
Copyright © 2017 The Apache Software Foundation. All rights reserved.