SparkPipeline (Apache Crunch 0.9.0 API)

This project has retired. For details please refer to its Attic page.

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.impl.spark
Class SparkPipeline

java.lang.Object
  org.apache.crunch.impl.dist.DistributedPipeline
      org.apache.crunch.impl.spark.SparkPipeline

All Implemented Interfaces:: Pipeline

public class SparkPipeline
extends DistributedPipeline
extends DistributedPipeline

Field Summary

Fields inherited from class org.apache.crunch.impl.dist.DistributedPipeline

factory, outputTargets, outputTargetsToMaterialize

Constructor Summary

SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext, String appName)


SparkPipeline(String sparkConnect, String appName)


Method Summary

<T> void cache(PCollection<T> pcollection, CachingOptions options)
          Caches the given PCollection so that it will be processed at most once during pipeline execution.

PipelineResult done()
          Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run.

<T> Iterable<T> materialize(PCollection<T> pcollection)
          Create the given PCollection and read the data it contains into the returned Collection instance for client use.

PipelineResult run()
          Constructs and executes a series of MapReduce jobs in order to write data to the output targets.

PipelineExecution runAsync()
          Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.

Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline

cleanup, createIntermediateOutput, createTempPath, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Fields inherited from class org.apache.crunch.impl.dist.DistributedPipeline
`factory, outputTargets, outputTargetsToMaterialize`

Constructor Summary
`SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext, String appName)`
`SparkPipeline(String sparkConnect, String appName)`

Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline
`cleanup, createIntermediateOutput, createTempPath, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

SparkPipeline

public SparkPipeline(String sparkConnect,
                     String appName)

SparkPipeline

public SparkPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext,
                     String appName)

Method Detail