MemPipeline (Apache Crunch 0.10.0 API)

This project has retired. For details please refer to its Attic page.

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.impl.mem
Class MemPipeline

java.lang.Object
  org.apache.crunch.impl.mem.MemPipeline

All Implemented Interfaces:: Pipeline

public class MemPipeline
extends Object
implements Pipeline
extends Object
implements Pipeline

Method Summary

<T> void cache(PCollection<T> pcollection, CachingOptions options)
          Caches the given PCollection so that it will be processed at most once during pipeline execution.

void cleanup(boolean force)
          Cleans up any artifacts created as a result of running the pipeline.

static void clearCounters()


static <T> PCollection<T> collectionOf(Iterable<T> collect)


static <T> PCollection<T> collectionOf(T... ts)


PipelineResult done()
          Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run.

<T> PCollection<T> emptyPCollection(PType<T> ptype)


<K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)


void enableDebug()
          Turn on debug logging for jobs that are run from this pipeline.

org.apache.hadoop.conf.Configuration getConfiguration()
          Returns the Configuration instance associated with this pipeline.

static org.apache.hadoop.mapreduce.Counters getCounters()


static Pipeline getInstance()


String getName()
          Returns the name of this pipeline.

<T> Iterable<T> materialize(PCollection<T> pcollection)
          Create the given PCollection and read the data it contains into the returned Collection instance for client use.

<T> PCollection<T> read(Source<T> source)
          Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.

<K,V> PTable<K,V> read(TableSource<K,V> source)
          A version of the read method for TableSource instances that map to PTables.

PCollection<String> readTextFile(String pathName)
          A convenience method for reading a text file.

PipelineResult run()
          Constructs and executes a series of MapReduce jobs in order to write data to the output targets.

PipelineExecution runAsync()
          Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.

void setConfiguration(org.apache.hadoop.conf.Configuration conf)
          Set the Configuration to use with this pipeline.

static <S,T> PTable<S,T> tableOf(Iterable<Pair<S,T>> pairs)


static <S,T> PTable<S,T> tableOf(S s, T t, Object... more)


static <T> PCollection<T> typedCollectionOf(PType<T> ptype, Iterable<T> collect)


static <T> PCollection<T> typedCollectionOf(PType<T> ptype, T... ts)


static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, Iterable<Pair<S,T>> pairs)


static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, S s, T t, Object... more)


void write(PCollection<?> collection, Target target)
          Write the given collection to the given target on the next pipeline run.

void write(PCollection<?> collection, Target target, Target.WriteMode writeMode)
          Write the contents of the PCollection to the given Target, using the storage format specified by the target and the given WriteMode for cases where the referenced Target already exists.

<T> void writeTextFile(PCollection<T> collection, String pathName)
          A convenience method for writing a text file.

Methods inherited from class java.lang.Object

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Method Detail

getCounters

public static org.apache.hadoop.mapreduce.Counters getCounters()

clearCounters

public static void clearCounters()

getInstance

public static Pipeline getInstance()

collectionOf

public static <T> PCollection<T> collectionOf(T... ts)

collectionOf

public static <T> PCollection<T> collectionOf(Iterable<T> collect)

typedCollectionOf

public static <T> PCollection<T> typedCollectionOf(PType<T> ptype,
                                                   T... ts)

typedCollectionOf

public static <T> PCollection<T> typedCollectionOf(PType<T> ptype,
                                                   Iterable<T> collect)

tableOf

public static <S,T> PTable<S,T> tableOf(S s,
                                        T t,
                                        Object... more)

typedTableOf

public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype,
                                             S s,
                                             T t,
                                             Object... more)

tableOf

public static <S,T> PTable<S,T> tableOf(Iterable<Pair<S,T>> pairs)

typedTableOf

public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype,
                                             Iterable<Pair<S,T>> pairs)

setConfiguration

public void setConfiguration(org.apache.hadoop.conf.Configuration conf)

Description copied from interface: Pipeline

Set the Configuration to use with this pipeline.

Specified by:: setConfiguration in interface Pipeline

getConfiguration

public org.apache.hadoop.conf.Configuration getConfiguration()

Description copied from interface: Pipeline

Returns the Configuration instance associated with this pipeline.

Specified by:: getConfiguration in interface Pipeline

read

public <T> PCollection<T> read(Source<T> source)

Description copied from interface: Pipeline

Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.

Specified by:: read in interface Pipeline

Parameters:: source - The source of data
Returns:: A PCollection that references the given source

read

public <K,V> PTable<K,V> read(TableSource<K,V> source)

Description copied from interface: Pipeline

A version of the read method for TableSource instances that map to PTables.

Specified by:: read in interface Pipeline

Parameters:: source - The source of the data
Returns:: A PTable that references the given source

write

public void write(PCollection<?> collection,
                  Target target)

Description copied from interface: Pipeline

Write the given collection to the given target on the next pipeline run. The system will check to see if the target's location already exists using the WriteMode.DEFAULT rule for the given Target.

Specified by:: write in interface Pipeline

Parameters:: collection - The collection; target - The output target

write

public void write(PCollection<?> collection,
                  Target target,
                  Target.WriteMode writeMode)

Description copied from interface: Pipeline

Write the contents of the PCollection to the given Target, using the storage format specified by the target and the given WriteMode for cases where the referenced Target already exists.

Specified by:: write in interface Pipeline

Parameters:: collection - The collection; target - The target to write to; writeMode - The strategy to use for handling existing outputs

readTextFile

public PCollection<String> readTextFile(String pathName)

Description copied from interface: Pipeline

A convenience method for reading a text file.

Specified by:: readTextFile in interface Pipeline

writeTextFile

public <T> void writeTextFile(PCollection<T> collection,
                              String pathName)

Description copied from interface: Pipeline

A convenience method for writing a text file.

Specified by:: writeTextFile in interface Pipeline

materialize

public <T> Iterable<T> materialize(PCollection<T> pcollection)

Description copied from interface: Pipeline

Create the given PCollection and read the data it contains into the returned Collection instance for client use.

Specified by:: materialize in interface Pipeline

Parameters:: pcollection - The PCollection to materialize
Returns:: the data from the PCollection as a read-only Collection

cache

public <T> void cache(PCollection<T> pcollection,
                      CachingOptions options)

Description copied from interface: Pipeline

Caches the given PCollection so that it will be processed at most once during pipeline execution.

Specified by:: cache in interface Pipeline

Parameters:: pcollection - The PCollection to cache; options - The options for how the cached data is stored

emptyPCollection

public <T> PCollection<T> emptyPCollection(PType<T> ptype)

Specified by:: emptyPCollection in interface Pipeline

emptyPTable

public <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)

Specified by:: emptyPTable in interface Pipeline

runAsync

public PipelineExecution runAsync()

Description copied from interface: Pipeline

Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.

Specified by:: runAsync in interface Pipeline

Returns:

run

public PipelineResult run()

Description copied from interface: Pipeline

Constructs and executes a series of MapReduce jobs in order to write data to the output targets.

Specified by:: run in interface Pipeline

cleanup

public void cleanup(boolean force)

Description copied from interface: Pipeline

Cleans up any artifacts created as a result of running the pipeline.

Specified by:: cleanup in interface Pipeline

Parameters:: force - forces the cleanup even if all targets of the pipeline have not been completed.

done

public PipelineResult done()

Description copied from interface: Pipeline

Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run.

Specified by:: done in interface Pipeline

enableDebug

public void enableDebug()

Description copied from interface: Pipeline

Turn on debug logging for jobs that are run from this pipeline.

Specified by:: enableDebug in interface Pipeline

getName

public String getName()

Description copied from interface: Pipeline

Returns the name of this pipeline.

Specified by:: getName in interface Pipeline

Returns:: Name of the pipeline

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.impl.mem Class MemPipeline

getCounters

clearCounters

getInstance

collectionOf

collectionOf

typedCollectionOf

typedCollectionOf

tableOf

typedTableOf

tableOf

typedTableOf

setConfiguration

getConfiguration

read

read

write

write

readTextFile

writeTextFile

materialize

cache

emptyPCollection

emptyPTable

runAsync

run

cleanup

done

enableDebug

getName

org.apache.crunch.impl.mem
Class MemPipeline