|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.crunch.impl.dist.DistributedPipeline
public abstract class DistributedPipeline
| Constructor Summary | |
|---|---|
DistributedPipeline(String name,
org.apache.hadoop.conf.Configuration conf,
PCollectionFactory factory)
Instantiate with a custom name and configuration. |
|
| Method Summary | ||
|---|---|---|
void |
cleanup(boolean force)
Cleans up any artifacts created as a result of running the pipeline. |
|
|
createIntermediateOutput(PType<T> ptype)
|
|
org.apache.hadoop.fs.Path |
createTempPath()
|
|
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run. |
|
|
emptyPCollection(PType<S> ptype)
|
|
|
emptyPTable(PTableType<K,V> ptype)
|
|
void |
enableDebug()
Turn on debug logging for jobs that are run from this pipeline. |
|
org.apache.hadoop.conf.Configuration |
getConfiguration()
Returns the Configuration instance associated with this pipeline. |
|
PCollectionFactory |
getFactory()
|
|
|
getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection. |
|
String |
getName()
Returns the name of this pipeline. |
|
int |
getNextAnonymousStageId()
|
|
|
read(Source<S> source)
Converts the given Source into a PCollection that is
available to jobs run using this Pipeline instance. |
|
|
read(TableSource<K,V> source)
A version of the read method for TableSource instances that map to
PTables. |
|
PCollection<String> |
readTextFile(String pathName)
A convenience method for reading a text file. |
|
void |
setConfiguration(org.apache.hadoop.conf.Configuration conf)
Set the Configuration to use with this pipeline. |
|
void |
write(PCollection<?> pcollection,
Target target)
Write the given collection to the given target on the next pipeline run. |
|
void |
write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode)
Write the contents of the PCollection to the given Target,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
|
|
writeTextFile(PCollection<T> pcollection,
String pathName)
A convenience method for writing a text file. |
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.crunch.Pipeline |
|---|
cache, materialize, run, runAsync |
| Constructor Detail |
|---|
public DistributedPipeline(String name,
org.apache.hadoop.conf.Configuration conf,
PCollectionFactory factory)
name - Display name of the pipelineconf - Configuration to be used within all MapReduce jobs run in the pipeline| Method Detail |
|---|
public PCollectionFactory getFactory()
public org.apache.hadoop.conf.Configuration getConfiguration()
PipelineConfiguration instance associated with this pipeline.
getConfiguration in interface Pipelinepublic void setConfiguration(org.apache.hadoop.conf.Configuration conf)
PipelineConfiguration to use with this pipeline.
setConfiguration in interface Pipelinepublic PipelineResult done()
Pipelinerun.
done in interface Pipelinepublic <S> PCollection<S> read(Source<S> source)
PipelineSource into a PCollection that is
available to jobs run using this Pipeline instance.
read in interface Pipelinesource - The source of data
public <K,V> PTable<K,V> read(TableSource<K,V> source)
PipelineTableSource instances that map to
PTables.
read in interface Pipelinesource - The source of the data
public PCollection<String> readTextFile(String pathName)
Pipeline
readTextFile in interface Pipeline
public void write(PCollection<?> pcollection,
Target target)
PipelineWriteMode.DEFAULT rule for the given Target.
write in interface Pipelinepcollection - The collectiontarget - The output target
public void write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode)
PipelinePCollection to the given Target,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists.
write in interface Pipelinepcollection - The collectiontarget - The target to write towriteMode - The strategy to use for handling existing outputspublic <S> PCollection<S> emptyPCollection(PType<S> ptype)
emptyPCollection in interface Pipelinepublic <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
emptyPTable in interface Pipelinepublic <T> ReadableSource<T> getMaterializeSourceTarget(PCollection<T> pcollection)
PCollection.
This is primarily intended as a helper method to Pipeline.materialize(PCollection). The
underlying data of the ReadableSourceTarget may not be actually present until the pipeline is
run.
pcollection - The collection for which the ReadableSourceTarget is to be retrieved
IllegalArgumentException - If no ReadableSourceTarget can be retrieved for the given
PCollectionpublic <T> SourceTarget<T> createIntermediateOutput(PType<T> ptype)
public org.apache.hadoop.fs.Path createTempPath()
public <T> void writeTextFile(PCollection<T> pcollection,
String pathName)
Pipeline
writeTextFile in interface Pipelinepublic void cleanup(boolean force)
Pipelinerunning the pipeline.
cleanup in interface Pipelineforce - forces the cleanup even if all targets of the pipeline have not been completed.public int getNextAnonymousStageId()
public void enableDebug()
Pipeline
enableDebug in interface Pipelinepublic String getName()
Pipeline
getName in interface Pipeline
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||