MRPipeline (Apache Crunch 0.9.0 API)

This project has retired. For details please refer to its Attic page.

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.impl.mr
Class MRPipeline

java.lang.Object
  org.apache.crunch.impl.dist.DistributedPipeline
      org.apache.crunch.impl.mr.MRPipeline

All Implemented Interfaces:: Pipeline

public class MRPipeline
extends DistributedPipeline
extends DistributedPipeline

Pipeline implementation that is executed within Hadoop MapReduce.

Field Summary

Fields inherited from class org.apache.crunch.impl.dist.DistributedPipeline

factory, outputTargets, outputTargetsToMaterialize

Constructor Summary

MRPipeline(Class<?> jarClass)
          Instantiate with a default Configuration and name.

MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
          Instantiate with a custom configuration and default naming.

MRPipeline(Class<?> jarClass, String name)
          Instantiate with a custom pipeline name.

MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)
          Instantiate with a custom name and configuration.

Method Summary

<T> void cache(PCollection<T> pcollection, CachingOptions options)
          Caches the given PCollection so that it will be processed at most once during pipeline execution.

<T> Iterable<T> materialize(PCollection<T> pcollection)
          Create the given PCollection and read the data it contains into the returned Collection instance for client use.

MRExecutor plan()


PipelineResult run()
          Constructs and executes a series of MapReduce jobs in order to write data to the output targets.

MRPipelineExecution runAsync()
          Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.

Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline

cleanup, createIntermediateOutput, createTempPath, done, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Fields inherited from class org.apache.crunch.impl.dist.DistributedPipeline
`factory, outputTargets, outputTargetsToMaterialize`

Constructor Summary
`MRPipeline(Class<?> jarClass)` Instantiate with a default Configuration and name.
`MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)` Instantiate with a custom configuration and default naming.
`MRPipeline(Class<?> jarClass, String name)` Instantiate with a custom pipeline name.
`MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)` Instantiate with a custom name and configuration.

Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline
`cleanup, createIntermediateOutput, createTempPath, done, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, readTextFile, setConfiguration, write, write, writeTextFile`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

MRPipeline

public MRPipeline(Class<?> jarClass)

Instantiate with a default Configuration and name.

Parameters:: jarClass - Class containing the main driver method for running the pipeline

MRPipeline

public MRPipeline(Class<?> jarClass,
                  String name)

Instantiate with a custom pipeline name. The name will be displayed in the Hadoop JobTracker.

Parameters:: jarClass - Class containing the main driver method for running the pipeline; name - Display name of the pipeline

MRPipeline

public MRPipeline(Class<?> jarClass,
                  org.apache.hadoop.conf.Configuration conf)

Instantiate with a custom configuration and default naming.

Parameters:: jarClass - Class containing the main driver method for running the pipeline; conf - Configuration to be used within all MapReduce jobs run in the pipeline

MRPipeline

public MRPipeline(Class<?> jarClass,
                  String name,
                  org.apache.hadoop.conf.Configuration conf)

Instantiate with a custom name and configuration. The name will be displayed in the Hadoop JobTracker.

Parameters:: jarClass - Class containing the main driver method for running the pipeline; name - Display name of the pipeline; conf - Configuration to be used within all MapReduce jobs run in the pipeline

Method Detail

plan

public MRExecutor plan()

run

public PipelineResult run()

Description copied from interface: Pipeline

Constructs and executes a series of MapReduce jobs in order to write data to the output targets.

runAsync

public MRPipelineExecution runAsync()

Description copied from interface: Pipeline

Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.

Returns:

materialize

public <T> Iterable<T> materialize(PCollection<T> pcollection)

Description copied from interface: Pipeline

Create the given PCollection and read the data it contains into the returned Collection instance for client use.

Parameters:: pcollection - The PCollection to materialize
Returns:: the data from the PCollection as a read-only Collection

cache

public <T> void cache(PCollection<T> pcollection,
                      CachingOptions options)

Description copied from interface: Pipeline

Caches the given PCollection so that it will be processed at most once during pipeline execution.

Parameters:: pcollection - The PCollection to cache; options - The options for how the cached data is stored

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.impl.mr Class MRPipeline

MRPipeline

MRPipeline

MRPipeline

MRPipeline

plan

run

runAsync

materialize

cache

org.apache.crunch.impl.mr
Class MRPipeline