|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.crunch.impl.dist.DistributedPipeline
org.apache.crunch.impl.spark.SparkPipeline
public class SparkPipeline
| Constructor Summary | |
|---|---|
SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext,
String appName)
|
|
SparkPipeline(String sparkConnect,
String appName)
|
|
SparkPipeline(String sparkConnect,
String appName,
Class<?> jarClass)
|
|
| Method Summary | ||
|---|---|---|
|
cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once during pipeline execution. |
|
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run. |
|
|
emptyPCollection(PType<S> ptype)
|
|
|
emptyPTable(PTableType<K,V> ptype)
|
|
|
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the returned Collection instance for client use. |
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data to the output targets. |
|
PipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control
job execution. |
|
| Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline |
|---|
cleanup, createIntermediateOutput, createTempPath, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public SparkPipeline(String sparkConnect,
String appName)
public SparkPipeline(String sparkConnect,
String appName,
Class<?> jarClass)
public SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext,
String appName)
| Method Detail |
|---|
public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
pcollection - The PCollection to materialize
public <S> PCollection<S> emptyPCollection(PType<S> ptype)
emptyPCollection in interface PipelineemptyPCollection in class DistributedPipelinepublic <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
emptyPTable in interface PipelineemptyPTable in class DistributedPipeline
public <T> void cache(PCollection<T> pcollection,
CachingOptions options)
Pipeline
pcollection - The PCollection to cacheoptions - The options for how the cached data is storedpublic PipelineResult run()
Pipeline
public PipelineExecution runAsync()
PipelineListenableFuture to allow clients to control
job execution.
public PipelineResult done()
Pipelinerun.
done in interface Pipelinedone in class DistributedPipeline
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||