Modifier and Type | Method and Description |
---|---|
static <T> PCollection<T> |
collectionOf(Iterable<T> collect) |
static <T> PCollection<T> |
collectionOf(T... ts) |
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any
intermediate data files that were created in this run or previous calls to
run . |
void |
enableDebug()
Turn on debug logging for jobs that are run from this pipeline.
|
org.apache.hadoop.conf.Configuration |
getConfiguration()
Returns the
Configuration instance associated with this pipeline. |
static org.apache.hadoop.mapreduce.Counters |
getCounters() |
static Pipeline |
getInstance() |
String |
getName()
Returns the name of this pipeline.
|
<T> Iterable<T> |
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
|
<T> PCollection<T> |
read(Source<T> source)
Converts the given
Source into a PCollection that is
available to jobs run using this Pipeline instance. |
<K,V> PTable<K,V> |
read(TableSource<K,V> source)
A version of the read method for
TableSource instances that map to
PTable s. |
PCollection<String> |
readTextFile(String pathName)
A convenience method for reading a text file.
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data
to the output targets.
|
void |
setConfiguration(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration to use with this pipeline. |
static <S,T> PTable<S,T> |
tableOf(Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
tableOf(S s,
T t,
Object... more) |
static <T> PCollection<T> |
typedCollectionOf(PType<T> ptype,
Iterable<T> collect) |
static <T> PCollection<T> |
typedCollectionOf(PType<T> ptype,
T... ts) |
static <S,T> PTable<S,T> |
typedTableOf(PTableType<S,T> ptype,
Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
typedTableOf(PTableType<S,T> ptype,
S s,
T t,
Object... more) |
void |
write(PCollection<?> collection,
Target target)
Write the given collection to the given target on the next pipeline run.
|
void |
write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
Write the contents of the
PCollection to the given Target ,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
<T> void |
writeTextFile(PCollection<T> collection,
String pathName)
A convenience method for writing a text file.
|
public static org.apache.hadoop.mapreduce.Counters getCounters()
public static Pipeline getInstance()
public static <T> PCollection<T> collectionOf(T... ts)
public static <T> PCollection<T> collectionOf(Iterable<T> collect)
public static <T> PCollection<T> typedCollectionOf(PType<T> ptype, T... ts)
public static <T> PCollection<T> typedCollectionOf(PType<T> ptype, Iterable<T> collect)
public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, S s, T t, Object... more)
public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, Iterable<Pair<S,T>> pairs)
public void setConfiguration(org.apache.hadoop.conf.Configuration conf)
Pipeline
Configuration
to use with this pipeline.setConfiguration
in interface Pipeline
public org.apache.hadoop.conf.Configuration getConfiguration()
Pipeline
Configuration
instance associated with this pipeline.getConfiguration
in interface Pipeline
public <T> PCollection<T> read(Source<T> source)
Pipeline
Source
into a PCollection
that is
available to jobs run using this Pipeline
instance.public <K,V> PTable<K,V> read(TableSource<K,V> source)
Pipeline
TableSource
instances that map to
PTable
s.public void write(PCollection<?> collection, Target target)
Pipeline
WriteMode.DEFAULT
rule for the given Target
.public void write(PCollection<?> collection, Target target, Target.WriteMode writeMode)
Pipeline
PCollection
to the given Target
,
using the storage format specified by the target and the given
WriteMode
for cases where the referenced Target
already exists.public PCollection<String> readTextFile(String pathName)
Pipeline
readTextFile
in interface Pipeline
public <T> void writeTextFile(PCollection<T> collection, String pathName)
Pipeline
writeTextFile
in interface Pipeline
public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
materialize
in interface Pipeline
pcollection
- The PCollection to materializepublic PipelineResult run()
Pipeline
public PipelineResult done()
Pipeline
run
.public void enableDebug()
Pipeline
enableDebug
in interface Pipeline
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.