|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.crunch.impl.dist.DistributedPipeline org.apache.crunch.impl.spark.SparkPipeline
public class SparkPipeline
Constructor Summary | |
---|---|
SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext,
String appName)
|
|
SparkPipeline(String sparkConnect,
String appName)
|
|
SparkPipeline(String sparkConnect,
String appName,
Class<?> jarClass)
|
Method Summary | ||
---|---|---|
|
cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once during pipeline execution. |
|
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run . |
|
|
emptyPCollection(PType<S> ptype)
|
|
|
emptyPTable(PTableType<K,V> ptype)
|
|
|
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data to the output targets. |
|
PipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control
job execution. |
Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline |
---|
cleanup, createIntermediateOutput, createTempPath, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SparkPipeline(String sparkConnect, String appName)
public SparkPipeline(String sparkConnect, String appName, Class<?> jarClass)
public SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext, String appName)
Method Detail |
---|
public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
pcollection
- The PCollection to materialize
public <S> PCollection<S> emptyPCollection(PType<S> ptype)
emptyPCollection
in interface Pipeline
emptyPCollection
in class DistributedPipeline
public <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
emptyPTable
in interface Pipeline
emptyPTable
in class DistributedPipeline
public <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipeline
pcollection
- The PCollection to cacheoptions
- The options for how the cached data is storedpublic PipelineResult run()
Pipeline
public PipelineExecution runAsync()
Pipeline
ListenableFuture
to allow clients to control
job execution.
public PipelineResult done()
Pipeline
run
.
done
in interface Pipeline
done
in class DistributedPipeline
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |