public class MRPipeline extends DistributedPipeline
Constructor and Description |
---|
MRPipeline(Class<?> jarClass)
Instantiate with a default Configuration and name.
|
MRPipeline(Class<?> jarClass,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom configuration and default naming.
|
MRPipeline(Class<?> jarClass,
String name)
Instantiate with a custom pipeline name.
|
MRPipeline(Class<?> jarClass,
String name,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom name and configuration.
|
Modifier and Type | Method and Description |
---|---|
MRPipeline |
addCompletionHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook) |
MRPipeline |
addPrepareHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook) |
<T> void |
cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once
during pipeline execution.
|
List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> |
getCompletionHooks() |
List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> |
getPrepareHooks() |
<T> Iterable<T> |
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
|
org.apache.crunch.impl.mr.exec.MRExecutor |
plan() |
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data
to the output targets.
|
MRPipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to
the output targets, but returns a
ListenableFuture to allow clients to control
job execution. |
cleanup, create, create, create, create, createIntermediateOutput, createTempPath, done, emptyPCollection, emptyPTable, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, read, read, readTextFile, sequentialDo, setConfiguration, union, unionTables, write, write, writeTextFile
public MRPipeline(Class<?> jarClass)
jarClass
- Class containing the main driver method for running the pipelinepublic MRPipeline(Class<?> jarClass, String name)
jarClass
- Class containing the main driver method for running the pipelinename
- Display name of the pipelinepublic MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
jarClass
- Class containing the main driver method for running the pipelineconf
- Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)
jarClass
- Class containing the main driver method for running the pipelinename
- Display name of the pipelineconf
- Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline addPrepareHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)
public List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> getPrepareHooks()
public MRPipeline addCompletionHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)
public List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> getCompletionHooks()
public org.apache.crunch.impl.mr.exec.MRExecutor plan()
public PipelineResult run()
Pipeline
public MRPipelineExecution runAsync()
Pipeline
ListenableFuture
to allow clients to control
job execution.public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
pcollection
- The PCollection to materializepublic <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipeline
pcollection
- The PCollection to cacheoptions
- The options for how the cached data is storedCopyright © 2016 The Apache Software Foundation. All rights reserved.