|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.crunch.impl.dist.DistributedPipeline org.apache.crunch.impl.mr.MRPipeline
public class MRPipeline
Pipeline implementation that is executed within Hadoop MapReduce.
Field Summary |
---|
Fields inherited from class org.apache.crunch.impl.dist.DistributedPipeline |
---|
factory, outputTargets, outputTargetsToMaterialize |
Constructor Summary | |
---|---|
MRPipeline(Class<?> jarClass)
Instantiate with a default Configuration and name. |
|
MRPipeline(Class<?> jarClass,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom configuration and default naming. |
|
MRPipeline(Class<?> jarClass,
String name)
Instantiate with a custom pipeline name. |
|
MRPipeline(Class<?> jarClass,
String name,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom name and configuration. |
Method Summary | ||
---|---|---|
|
cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once during pipeline execution. |
|
|
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
MRExecutor |
plan()
|
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data to the output targets. |
|
MRPipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control
job execution. |
Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline |
---|
cleanup, createIntermediateOutput, createTempPath, done, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public MRPipeline(Class<?> jarClass)
jarClass
- Class containing the main driver method for running the pipelinepublic MRPipeline(Class<?> jarClass, String name)
jarClass
- Class containing the main driver method for running the pipelinename
- Display name of the pipelinepublic MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
jarClass
- Class containing the main driver method for running the pipelineconf
- Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)
jarClass
- Class containing the main driver method for running the pipelinename
- Display name of the pipelineconf
- Configuration to be used within all MapReduce jobs run in the pipelineMethod Detail |
---|
public MRExecutor plan()
public PipelineResult run()
Pipeline
public MRPipelineExecution runAsync()
Pipeline
ListenableFuture
to allow clients to control
job execution.
public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
pcollection
- The PCollection to materialize
public <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipeline
pcollection
- The PCollection to cacheoptions
- The options for how the cached data is stored
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |