MRPipeline (Apache Crunch 0.15.0 API)

java.lang.Object
- org.apache.crunch.impl.dist.DistributedPipeline
- - org.apache.crunch.impl.mr.MRPipeline

All Implemented Interfaces:

Pipeline
```
public class MRPipeline
extends DistributedPipeline
```
Pipeline implementation that is executed within Hadoop MapReduce.

Constructor Summary

Constructors
Constructor and Description
`MRPipeline(Class<?> jarClass)` Instantiate with a default Configuration and name.
`MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)` Instantiate with a custom configuration and default naming.
`MRPipeline(Class<?> jarClass, String name)` Instantiate with a custom pipeline name.
`MRPipeline(Class<?> jarClass, String name, org.apache.hadoop.conf.Configuration conf)` Instantiate with a custom name and configuration.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`MRPipeline`	`addCompletionHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)`
`MRPipeline`	`addPrepareHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)`
`<T> void`	`cache(PCollection<T> pcollection, CachingOptions options)` Caches the given PCollection so that it will be processed at most once during pipeline execution.
`List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook>`	`getCompletionHooks()`
`List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook>`	`getPrepareHooks()`
`<T> Iterable<T>`	`materialize(PCollection<T> pcollection)` Create the given PCollection and read the data it contains into the returned Collection instance for client use.
`org.apache.crunch.impl.mr.exec.MRExecutor`	`plan()`
`PipelineResult`	`run()` Constructs and executes a series of MapReduce jobs in order to write data to the output targets.
`MRPipelineExecution`	`runAsync()` Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a `ListenableFuture` to allow clients to control job execution.

Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline
cleanup, create, create, create, create, createIntermediateOutput, createTempPath, done, emptyPCollection, emptyPTable, enableDebug, getConfiguration, getFactory, getMaterializeSourceTarget, getName, getNextAnonymousStageId, read, read, read, read, readTextFile, sequentialDo, setConfiguration, union, unionTables, write, write, writeTextFile

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - MRPipeline
```
public MRPipeline(Class<?> jarClass)
```
    Instantiate with a default Configuration and name.
    
    Parameters:
    
    jarClass - Class containing the main driver method for running the pipeline
  - MRPipeline
```
public MRPipeline(Class<?> jarClass,
                  String name)
```
    Instantiate with a custom pipeline name. The name will be displayed in the Hadoop JobTracker.
    
    Parameters:
    
    jarClass - Class containing the main driver method for running the pipeline
    
    name - Display name of the pipeline
  - MRPipeline
```
public MRPipeline(Class<?> jarClass,
                  org.apache.hadoop.conf.Configuration conf)
```
    Instantiate with a custom configuration and default naming.
    
    Parameters:
    
    jarClass - Class containing the main driver method for running the pipeline
    
    conf - Configuration to be used within all MapReduce jobs run in the pipeline
  - MRPipeline
```
public MRPipeline(Class<?> jarClass,
                  String name,
                  org.apache.hadoop.conf.Configuration conf)
```
    Instantiate with a custom name and configuration. The name will be displayed in the Hadoop JobTracker.
    
    Parameters:
    
    jarClass - Class containing the main driver method for running the pipeline
    
    name - Display name of the pipeline
    
    conf - Configuration to be used within all MapReduce jobs run in the pipeline
- Method Detail
  - addPrepareHook
```
public MRPipeline addPrepareHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)
```
  - getPrepareHooks
```
public List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> getPrepareHooks()
```
  - addCompletionHook
```
public MRPipeline addCompletionHook(org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook hook)
```
  - getCompletionHooks
```
public List<org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.Hook> getCompletionHooks()
```
  - plan
```
public org.apache.crunch.impl.mr.exec.MRExecutor plan()
```
  - run
```
public PipelineResult run()
```
    Description copied from interface: Pipeline
    
    Constructs and executes a series of MapReduce jobs in order to write data to the output targets.
  - runAsync
```
public MRPipelineExecution runAsync()
```
    Description copied from interface: Pipeline
    
    Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.
    
    Returns:
  - materialize
```
public <T> Iterable<T> materialize(PCollection<T> pcollection)
```
    Description copied from interface: Pipeline
    
    Create the given PCollection and read the data it contains into the returned Collection instance for client use.
    
    Parameters:
    
    pcollection - The PCollection to materialize
    
    Returns:
    
    the data from the PCollection as a read-only Collection
  - cache
```
public <T> void cache(PCollection<T> pcollection,
                      CachingOptions options)
```
    Description copied from interface: Pipeline
    
    Caches the given PCollection so that it will be processed at most once during pipeline execution.
    
    Parameters:
    
    pcollection - The PCollection to cache
    
    options - The options for how the cached data is stored

Class MRPipeline

Constructor Summary

Method Summary

Methods inherited from class org.apache.crunch.impl.dist.DistributedPipeline

Methods inherited from class java.lang.Object

Constructor Detail

MRPipeline

MRPipeline

MRPipeline

MRPipeline

Method Detail

addPrepareHook

getPrepareHooks

addCompletionHook

getCompletionHooks

plan

run

runAsync

materialize

cache