|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.crunch.impl.mr.MRPipeline
public class MRPipeline
Pipeline implementation that is executed within Hadoop MapReduce.
Constructor Summary | |
---|---|
MRPipeline(Class<?> jarClass)
Instantiate with a default Configuration and name. |
|
MRPipeline(Class<?> jarClass,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom configuration and default naming. |
|
MRPipeline(Class<?> jarClass,
String name)
Instantiate with a custom pipeline name. |
|
MRPipeline(Class<?> jarClass,
String name,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom name and configuration. |
Method Summary | ||
---|---|---|
|
createIntermediateOutput(PType<T> ptype)
|
|
org.apache.hadoop.fs.Path |
createTempPath()
|
|
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run . |
|
void |
enableDebug()
Turn on debug logging for jobs that are run from this pipeline. |
|
org.apache.hadoop.conf.Configuration |
getConfiguration()
Returns the Configuration instance associated with this pipeline. |
|
|
getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection . |
|
String |
getName()
Returns the name of this pipeline. |
|
int |
getNextAnonymousStageId()
|
|
|
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
org.apache.crunch.impl.mr.exec.MRExecutor |
plan()
|
|
|
read(Source<S> source)
Converts the given Source into a PCollection that is
available to jobs run using this Pipeline instance. |
|
|
read(TableSource<K,V> source)
A version of the read method for TableSource instances that map to
PTable s. |
|
PCollection<String> |
readTextFile(String pathName)
A convenience method for reading a text file. |
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data to the output targets. |
|
MRPipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control
job execution. |
|
void |
setConfiguration(org.apache.hadoop.conf.Configuration conf)
Set the Configuration to use with this pipeline. |
|
void |
write(PCollection<?> pcollection,
Target target)
Write the given collection to the given target on the next pipeline run. |
|
void |
write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode)
Write the contents of the PCollection to the given Target ,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
|
|
writeTextFile(PCollection<T> pcollection,
String pathName)
A convenience method for writing a text file. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public MRPipeline(Class<?> jarClass)
jarClass
- Class containing the main driver method for running the pipelinepublic MRPipeline(Class<?> jarClass, String name)
jarClass
- Class containing the main driver method for running the pipelinename
- Display name of the pipelinepublic MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
jarClass
- Class containing the main driver method for running the pipelineconf
- Configuration to be used within all MapReduce jobs run in the pipelinepublic MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)
jarClass
- Class containing the main driver method for running the pipelinename
- Display name of the pipelineconf
- Configuration to be used within all MapReduce jobs run in the pipelineMethod Detail |
---|
public org.apache.hadoop.conf.Configuration getConfiguration()
Pipeline
Configuration
instance associated with this pipeline.
getConfiguration
in interface Pipeline
public void setConfiguration(org.apache.hadoop.conf.Configuration conf)
Pipeline
Configuration
to use with this pipeline.
setConfiguration
in interface Pipeline
public org.apache.crunch.impl.mr.exec.MRExecutor plan()
public PipelineResult run()
Pipeline
run
in interface Pipeline
public MRPipelineExecution runAsync()
Pipeline
ListenableFuture
to allow clients to control
job execution.
runAsync
in interface Pipeline
public PipelineResult done()
Pipeline
run
.
done
in interface Pipeline
public <S> PCollection<S> read(Source<S> source)
Pipeline
Source
into a PCollection
that is
available to jobs run using this Pipeline
instance.
read
in interface Pipeline
source
- The source of data
public <K,V> PTable<K,V> read(TableSource<K,V> source)
Pipeline
TableSource
instances that map to
PTable
s.
read
in interface Pipeline
source
- The source of the data
public PCollection<String> readTextFile(String pathName)
Pipeline
readTextFile
in interface Pipeline
public void write(PCollection<?> pcollection, Target target)
Pipeline
WriteMode.DEFAULT
rule for the given Target
.
write
in interface Pipeline
pcollection
- The collectiontarget
- The output targetpublic void write(PCollection<?> pcollection, Target target, Target.WriteMode writeMode)
Pipeline
PCollection
to the given Target
,
using the storage format specified by the target and the given
WriteMode
for cases where the referenced Target
already exists.
write
in interface Pipeline
pcollection
- The collectiontarget
- The target to write towriteMode
- The strategy to use for handling existing outputspublic <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
materialize
in interface Pipeline
pcollection
- The PCollection to materialize
public <T> ReadableSource<T> getMaterializeSourceTarget(PCollection<T> pcollection)
PCollection
.
This is primarily intended as a helper method to materialize(PCollection)
. The
underlying data of the ReadableSourceTarget may not be actually present until the pipeline is
run.
pcollection
- The collection for which the ReadableSourceTarget is to be retrieved
IllegalArgumentException
- If no ReadableSourceTarget can be retrieved for the given
PCollectionpublic <T> SourceTarget<T> createIntermediateOutput(PType<T> ptype)
public org.apache.hadoop.fs.Path createTempPath()
public <T> void writeTextFile(PCollection<T> pcollection, String pathName)
Pipeline
writeTextFile
in interface Pipeline
public int getNextAnonymousStageId()
public void enableDebug()
Pipeline
enableDebug
in interface Pipeline
public String getName()
Pipeline
getName
in interface Pipeline
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |