| Modifier and Type | Method and Description | 
|---|---|
| <T> void | cache(PCollection<T> pcollection,
     CachingOptions options)Caches the given PCollection so that it will be processed at most once
 during pipeline execution. | 
| void | cleanup(boolean force)Cleans up any artifacts created as a result of  runningthe pipeline. | 
| static void | clearCounters() | 
| static <T> PCollection<T> | collectionOf(Iterable<T> collect) | 
| static <T> PCollection<T> | collectionOf(T... ts) | 
| <K,V> PTable<K,V> | create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <K,V> PTable<K,V> | create(Iterable<Pair<K,V>> contents,
      PTableType<K,V> ptype,
      CreateOptions options)Creates a  PTablecontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <T> PCollection<T> | create(Iterable<T> contents,
      PType<T> ptype)Creates a  PCollectioncontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| <T> PCollection<T> | create(Iterable<T> iterable,
      PType<T> ptype,
      CreateOptions options)Creates a  PCollectioncontaining the values found in the givenIterableusing an implementation-specific distribution mechanism. | 
| PipelineResult | done()Run any remaining jobs required to generate outputs and then clean up any
 intermediate data files that were created in this run or previous calls to
  run. | 
| <T> PCollection<T> | emptyPCollection(PType<T> ptype)Creates an empty  PCollectionof the givenPType. | 
| <K,V> PTable<K,V> | emptyPTable(PTableType<K,V> ptype)Creates an empty  PTableof the givenPTable Type. | 
| void | enableDebug()Turn on debug logging for jobs that are run from this pipeline. | 
| org.apache.hadoop.conf.Configuration | getConfiguration()Returns the  Configurationinstance associated with this pipeline. | 
| static org.apache.hadoop.mapreduce.Counters | getCounters() | 
| static Pipeline | getInstance() | 
| String | getName()Returns the name of this pipeline. | 
| <T> Iterable<T> | materialize(PCollection<T> pcollection)Create the given PCollection and read the data it contains into the
 returned Collection instance for client use. | 
| <T> PCollection<T> | read(Source<T> source)Converts the given  Sourceinto aPCollectionthat is
 available to jobs run using thisPipelineinstance. | 
| <T> PCollection<T> | read(Source<T> source,
    String named)Converts the given  Sourceinto aPCollectionthat is
 available to jobs run using thisPipelineinstance. | 
| <K,V> PTable<K,V> | read(TableSource<K,V> source)A version of the read method for  TableSourceinstances that map toPTables. | 
| <K,V> PTable<K,V> | read(TableSource<K,V> source,
    String named)A version of the read method for  TableSourceinstances that map toPTables. | 
| PCollection<String> | readTextFile(String pathName)A convenience method for reading a text file. | 
| PipelineResult | run()Constructs and executes a series of MapReduce jobs in order to write data
 to the output targets. | 
| PipelineExecution | runAsync()Constructs and starts a series of MapReduce jobs in order ot write data to
 the output targets, but returns a  ListenableFutureto allow clients to control
 job execution. | 
| <Output> Output | sequentialDo(PipelineCallable<Output> callable)Executes the given  PipelineCallableon the client after theTargetsthat the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps. | 
| void | setConfiguration(org.apache.hadoop.conf.Configuration conf)Set the  Configurationto use with this pipeline. | 
| static <S,T> PTable<S,T> | tableOf(Iterable<Pair<S,T>> pairs) | 
| static <S,T> PTable<S,T> | tableOf(S s,
       T t,
       Object... more) | 
| static <T> PCollection<T> | typedCollectionOf(PType<T> ptype,
                 Iterable<T> collect) | 
| static <T> PCollection<T> | typedCollectionOf(PType<T> ptype,
                 T... ts) | 
| static <S,T> PTable<S,T> | typedTableOf(PTableType<S,T> ptype,
            Iterable<Pair<S,T>> pairs) | 
| static <S,T> PTable<S,T> | typedTableOf(PTableType<S,T> ptype,
            S s,
            T t,
            Object... more) | 
| <S> PCollection<S> | union(List<PCollection<S>> collections) | 
| <K,V> PTable<K,V> | unionTables(List<PTable<K,V>> tables) | 
| void | write(PCollection<?> collection,
     Target target)Write the given collection to the given target on the next pipeline run. | 
| void | write(PCollection<?> collection,
     Target target,
     Target.WriteMode writeMode)Write the contents of the  PCollectionto the givenTarget,
 using the storage format specified by the target and the givenWriteModefor cases where the referencedTargetalready exists. | 
| <T> void | writeTextFile(PCollection<T> collection,
             String pathName)A convenience method for writing a text file. | 
public static org.apache.hadoop.mapreduce.Counters getCounters()
public static void clearCounters()
public static Pipeline getInstance()
public static <T> PCollection<T> collectionOf(T... ts)
public static <T> PCollection<T> collectionOf(Iterable<T> collect)
public static <T> PCollection<T> typedCollectionOf(PType<T> ptype, T... ts)
public static <T> PCollection<T> typedCollectionOf(PType<T> ptype, Iterable<T> collect)
public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, S s, T t, Object... more)
public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, Iterable<Pair<S,T>> pairs)
public void setConfiguration(org.apache.hadoop.conf.Configuration conf)
PipelineConfiguration to use with this pipeline.setConfiguration in interface Pipelinepublic org.apache.hadoop.conf.Configuration getConfiguration()
PipelineConfiguration instance associated with this pipeline.getConfiguration in interface Pipelinepublic <T> PCollection<T> read(Source<T> source)
PipelineSource into a PCollection that is
 available to jobs run using this Pipeline instance.public <T> PCollection<T> read(Source<T> source, String named)
PipelineSource into a PCollection that is
 available to jobs run using this Pipeline instance.public <K,V> PTable<K,V> read(TableSource<K,V> source)
PipelineTableSource instances that map to
 PTables.public <K,V> PTable<K,V> read(TableSource<K,V> source, String named)
PipelineTableSource instances that map to
 PTables.public void write(PCollection<?> collection, Target target)
PipelineWriteMode.DEFAULT rule for the given Target.public void write(PCollection<?> collection, Target target, Target.WriteMode writeMode)
PipelinePCollection to the given Target,
 using the storage format specified by the target and the given
 WriteMode for cases where the referenced Target
 already exists.public PCollection<String> readTextFile(String pathName)
PipelinereadTextFile in interface Pipelinepublic <T> void writeTextFile(PCollection<T> collection, String pathName)
PipelinewriteTextFile in interface Pipelinepublic <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipelinematerialize in interface Pipelinepcollection - The PCollection to materializepublic <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipelinepublic <T> PCollection<T> emptyPCollection(PType<T> ptype)
PipelinePCollection of the given PType.emptyPCollection in interface Pipelineptype - The PType of the empty PCollectionpublic <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
PipelinePTable of the given PTable Type.emptyPTable in interface Pipelineptype - The PTableType of the empty PTablepublic <T> PCollection<T> create(Iterable<T> contents, PType<T> ptype)
PipelinePCollection containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <T> PCollection<T> create(Iterable<T> iterable, PType<T> ptype, CreateOptions options)
PipelinePCollection containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype)
PipelinePTable containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype, CreateOptions options)
PipelinePTable containing the values found in the given Iterable
 using an implementation-specific distribution mechanism.public <S> PCollection<S> union(List<PCollection<S>> collections)
public <K,V> PTable<K,V> unionTables(List<PTable<K,V>> tables)
unionTables in interface Pipelinepublic <Output> Output sequentialDo(PipelineCallable<Output> callable)
PipelinePipelineCallable on the client after the Targets
 that the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps.sequentialDo in interface PipelineOutput - The return type of the PipelineCallablecallable - The sequential logic to executepublic PipelineExecution runAsync()
PipelineListenableFuture to allow clients to control
 job execution.public PipelineResult run()
Pipelinepublic void cleanup(boolean force)
Pipelinerunning the pipeline.public PipelineResult done()
Pipelinerun.public void enableDebug()
PipelineenableDebug in interface PipelineCopyright © 2017 The Apache Software Foundation. All rights reserved.