public class MRPipeline extends DistributedPipeline
| Constructor and Description | 
|---|
| MRPipeline(Class<?> jarClass)Instantiate with a default Configuration and name. | 
| MRPipeline(Class<?> jarClass,
          org.apache.hadoop.conf.Configuration conf)Instantiate with a custom configuration and default naming. | 
| MRPipeline(Class<?> jarClass,
          String name)Instantiate with a custom pipeline name. | 
| MRPipeline(Class<?> jarClass,
          String name,
          org.apache.hadoop.conf.Configuration conf)Instantiate with a custom name and configuration. | 
| Modifier and Type | Method and Description | 
|---|---|
| MRPipeline | addCompletionHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook) | 
| MRPipeline | addPrepareHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook) | 
| <T> void | cache(PCollection<T> pcollection,
     CachingOptions options)Caches the given PCollection so that it will be processed at most once
 during pipeline execution. | 
| List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> | getCompletionHooks() | 
| List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> | getPrepareHooks() | 
| <T> Iterable<T> | materialize(PCollection<T> pcollection)Create the given PCollection and read the data it contains into the
 returned Collection instance for client use. | 
| org.apache.crunch.impl.mr.exec.MRExecutor | plan() | 
| PipelineResult | run()Constructs and executes a series of MapReduce jobs in order to write data
 to the output targets. | 
| MRPipelineExecution | runAsync()Constructs and starts a series of MapReduce jobs in order ot write data to
 the output targets, but returns a  ListenableFutureto allow clients to control
 job execution. | 
cleanup, create, create, create, create, createIntermediateOutput, createTempPath, done, emptyPCollection, emptyPTable, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, read, read, readTextFile, sequentialDo, setConfiguration, union, unionTables, write, write, writeTextFilepublic MRPipeline(Class<?> jarClass)
jarClass - Class containing the main driver method for running the pipelinepublic MRPipeline(Class<?> jarClass, String name)
jarClass - Class containing the main driver method for running the pipelinename - Display name of the pipelinepublic MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
jarClass - Class containing the main driver method for running the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)
jarClass - Class containing the main driver method for running the pipelinename - Display name of the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline addPrepareHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)
public List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> getPrepareHooks()
public MRPipeline addCompletionHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)
public List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> getCompletionHooks()
public org.apache.crunch.impl.mr.exec.MRExecutor plan()
public PipelineResult run()
Pipelinepublic MRPipelineExecution runAsync()
PipelineListenableFuture to allow clients to control
 job execution.public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipelinepcollection - The PCollection to materializepublic <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipelinepcollection - The PCollection to cacheoptions - The options for how the cached data is storedCopyright © 2017 The Apache Software Foundation. All rights reserved.