|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.crunch.impl.mr.MRPipeline
public class MRPipeline
Pipeline implementation that is executed within Hadoop MapReduce.
| Constructor Summary | |
|---|---|
MRPipeline(Class<?> jarClass)
Instantiate with a default Configuration and name. |
|
MRPipeline(Class<?> jarClass,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom configuration and default naming. |
|
MRPipeline(Class<?> jarClass,
String name)
Instantiate with a custom pipeline name. |
|
MRPipeline(Class<?> jarClass,
String name,
org.apache.hadoop.conf.Configuration conf)
Instantiate with a custom name and configuration. |
|
| Method Summary | ||
|---|---|---|
|
createIntermediateOutput(PType<T> ptype)
|
|
org.apache.hadoop.fs.Path |
createTempPath()
|
|
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run. |
|
void |
enableDebug()
Turn on debug logging for jobs that are run from this pipeline. |
|
org.apache.hadoop.conf.Configuration |
getConfiguration()
Returns the Configuration instance associated with this pipeline. |
|
|
getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection. |
|
String |
getName()
Returns the name of this pipeline. |
|
int |
getNextAnonymousStageId()
|
|
|
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
org.apache.crunch.impl.mr.exec.MRExecutor |
plan()
|
|
|
read(Source<S> source)
Converts the given Source into a PCollection that is
available to jobs run using this Pipeline instance. |
|
|
read(TableSource<K,V> source)
A version of the read method for TableSource instances that map to
PTables. |
|
PCollection<String> |
readTextFile(String pathName)
A convenience method for reading a text file. |
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data to the output targets. |
|
void |
setConfiguration(org.apache.hadoop.conf.Configuration conf)
Set the Configuration to use with this pipeline. |
|
void |
write(PCollection<?> pcollection,
Target target)
Write the given collection to the given target on the next pipeline run. |
|
|
writeTextFile(PCollection<T> pcollection,
String pathName)
A convenience method for writing a text file. |
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public MRPipeline(Class<?> jarClass)
jarClass - Class containing the main driver method for running the pipeline
public MRPipeline(Class<?> jarClass,
String name)
jarClass - Class containing the main driver method for running the pipelinename - Display name of the pipeline
public MRPipeline(Class<?> jarClass,
org.apache.hadoop.conf.Configuration conf)
jarClass - Class containing the main driver method for running the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipeline
public MRPipeline(Class<?> jarClass,
String name,
org.apache.hadoop.conf.Configuration conf)
jarClass - Class containing the main driver method for running the pipelinename - Display name of the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipeline| Method Detail |
|---|
public org.apache.hadoop.conf.Configuration getConfiguration()
PipelineConfiguration instance associated with this pipeline.
getConfiguration in interface Pipelinepublic void setConfiguration(org.apache.hadoop.conf.Configuration conf)
PipelineConfiguration to use with this pipeline.
setConfiguration in interface Pipelinepublic org.apache.crunch.impl.mr.exec.MRExecutor plan()
public PipelineResult run()
Pipeline
run in interface Pipelinepublic PipelineResult done()
Pipelinerun.
done in interface Pipelinepublic <S> PCollection<S> read(Source<S> source)
PipelineSource into a PCollection that is
available to jobs run using this Pipeline instance.
read in interface Pipelinesource - The source of data
public <K,V> PTable<K,V> read(TableSource<K,V> source)
PipelineTableSource instances that map to
PTables.
read in interface Pipelinesource - The source of the data
public PCollection<String> readTextFile(String pathName)
Pipeline
readTextFile in interface Pipeline
public void write(PCollection<?> pcollection,
Target target)
Pipeline
write in interface Pipelinepcollection - The collectiontarget - The output targetpublic <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
materialize in interface Pipelinepcollection - The PCollection to materialize
public <T> ReadableSourceTarget<T> getMaterializeSourceTarget(PCollection<T> pcollection)
PCollection.
This is primarily intended as a helper method to materialize(PCollection). The
underlying data of the ReadableSourceTarget may not be actually present until the pipeline is
run.
pcollection - The collection for which the ReadableSourceTarget is to be retrieved
IllegalArgumentException - If no ReadableSourceTarget can be retrieved for the given
PCollectionpublic <T> SourceTarget<T> createIntermediateOutput(PType<T> ptype)
public org.apache.hadoop.fs.Path createTempPath()
public <T> void writeTextFile(PCollection<T> pcollection,
String pathName)
Pipeline
writeTextFile in interface Pipelinepublic int getNextAnonymousStageId()
public void enableDebug()
Pipeline
enableDebug in interface Pipelinepublic String getName()
Pipeline
getName in interface Pipeline
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||