cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once
during pipeline execution.
done()
Run any remaining jobs required to generate outputs and then clean up any
intermediate data files that were created in this run or previous calls to
run.
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to
the output targets, but returns a ListenableFuture to allow clients to control
job execution.
Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline
Constructs and starts a series of MapReduce jobs in order ot write data to
the output targets, but returns a ListenableFuture to allow clients to control
job execution.
Run any remaining jobs required to generate outputs and then clean up any
intermediate data files that were created in this run or previous calls to
run.