public class MRPipeline extends DistributedPipeline
| Constructor and Description |
|---|
MRPipeline(Class<?> jarClass)
Instantiate with a default Configuration and name.
|
MRPipeline(Class<?> jarClass,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom configuration and default naming.
|
MRPipeline(Class<?> jarClass,
String name)
Instantiate with a custom pipeline name.
|
MRPipeline(Class<?> jarClass,
String name,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom name and configuration.
|
| Modifier and Type | Method and Description |
|---|---|
<T> void |
cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once
during pipeline execution.
|
<T> Iterable<T> |
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
|
org.apache.crunch.impl.mr.exec.MRExecutor |
plan() |
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data
to the output targets.
|
MRPipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to
the output targets, but returns a
ListenableFuture to allow clients to control
job execution. |
cleanup, create, create, create, create, createIntermediateOutput, createTempPath, done, emptyPCollection, emptyPTable, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, read, read, readTextFile, sequentialDo, setConfiguration, union, unionTables, write, write, writeTextFilepublic MRPipeline(Class<?> jarClass)
jarClass - Class containing the main driver method for running the pipelinepublic MRPipeline(Class<?> jarClass, String name)
jarClass - Class containing the main driver method for running the pipelinename - Display name of the pipelinepublic MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
jarClass - Class containing the main driver method for running the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)
jarClass - Class containing the main driver method for running the pipelinename - Display name of the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipelinepublic org.apache.crunch.impl.mr.exec.MRExecutor plan()
public PipelineResult run()
Pipelinepublic MRPipelineExecution runAsync()
PipelineListenableFuture to allow clients to control
job execution.public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipelinepcollection - The PCollection to materializepublic <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipelinepcollection - The PCollection to cacheoptions - The options for how the cached data is storedCopyright © 2015 The Apache Software Foundation. All Rights Reserved.