Modifier and Type | Method and Description |
---|---|
<T> void |
cache(PCollection<T> pcollection,
CachingOptions options)
Caches the given PCollection so that it will be processed at most once
during pipeline execution.
|
void |
cleanup(boolean force)
Cleans up any artifacts created as a result of
running the pipeline. |
static void |
clearCounters() |
static <T> PCollection<T> |
collectionOf(Iterable<T> collect) |
static <T> PCollection<T> |
collectionOf(T... ts) |
<K,V> PTable<K,V> |
create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype)
Creates a
PTable containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
<K,V> PTable<K,V> |
create(Iterable<Pair<K,V>> contents,
PTableType<K,V> ptype,
CreateOptions options)
Creates a
PTable containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
<T> PCollection<T> |
create(Iterable<T> contents,
PType<T> ptype)
Creates a
PCollection containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
<T> PCollection<T> |
create(Iterable<T> iterable,
PType<T> ptype,
CreateOptions options)
Creates a
PCollection containing the values found in the given Iterable
using an implementation-specific distribution mechanism. |
PipelineResult |
done()
Run any remaining jobs required to generate outputs and then clean up any
intermediate data files that were created in this run or previous calls to
run . |
<T> PCollection<T> |
emptyPCollection(PType<T> ptype)
Creates an empty
PCollection of the given PType . |
<K,V> PTable<K,V> |
emptyPTable(PTableType<K,V> ptype)
Creates an empty
PTable of the given PTable Type . |
void |
enableDebug()
Turn on debug logging for jobs that are run from this pipeline.
|
org.apache.hadoop.conf.Configuration |
getConfiguration()
Returns the
Configuration instance associated with this pipeline. |
static org.apache.hadoop.mapreduce.Counters |
getCounters() |
static Pipeline |
getInstance() |
String |
getName()
Returns the name of this pipeline.
|
<T> Iterable<T> |
materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
|
<T> PCollection<T> |
read(Source<T> source)
Converts the given
Source into a PCollection that is
available to jobs run using this Pipeline instance. |
<T> PCollection<T> |
read(Source<T> source,
String named)
Converts the given
Source into a PCollection that is
available to jobs run using this Pipeline instance. |
<K,V> PTable<K,V> |
read(TableSource<K,V> source)
A version of the read method for
TableSource instances that map to
PTable s. |
<K,V> PTable<K,V> |
read(TableSource<K,V> source,
String named)
A version of the read method for
TableSource instances that map to
PTable s. |
PCollection<String> |
readTextFile(String pathName)
A convenience method for reading a text file.
|
PipelineResult |
run()
Constructs and executes a series of MapReduce jobs in order to write data
to the output targets.
|
PipelineExecution |
runAsync()
Constructs and starts a series of MapReduce jobs in order ot write data to
the output targets, but returns a
ListenableFuture to allow clients to control
job execution. |
<Output> Output |
sequentialDo(PipelineCallable<Output> callable)
Executes the given
PipelineCallable on the client after the Targets
that the PipelineCallable depends on (if any) have been created by other pipeline
processing steps. |
void |
setConfiguration(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration to use with this pipeline. |
static <S,T> PTable<S,T> |
tableOf(Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
tableOf(S s,
T t,
Object... more) |
static <T> PCollection<T> |
typedCollectionOf(PType<T> ptype,
Iterable<T> collect) |
static <T> PCollection<T> |
typedCollectionOf(PType<T> ptype,
T... ts) |
static <S,T> PTable<S,T> |
typedTableOf(PTableType<S,T> ptype,
Iterable<Pair<S,T>> pairs) |
static <S,T> PTable<S,T> |
typedTableOf(PTableType<S,T> ptype,
S s,
T t,
Object... more) |
<S> PCollection<S> |
union(List<PCollection<S>> collections) |
<K,V> PTable<K,V> |
unionTables(List<PTable<K,V>> tables) |
void |
write(PCollection<?> collection,
Target target)
Write the given collection to the given target on the next pipeline run.
|
void |
write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
Write the contents of the
PCollection to the given Target ,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
<T> void |
writeTextFile(PCollection<T> collection,
String pathName)
A convenience method for writing a text file.
|
public static org.apache.hadoop.mapreduce.Counters getCounters()
public static void clearCounters()
public static Pipeline getInstance()
public static <T> PCollection<T> collectionOf(T... ts)
public static <T> PCollection<T> collectionOf(Iterable<T> collect)
public static <T> PCollection<T> typedCollectionOf(PType<T> ptype, T... ts)
public static <T> PCollection<T> typedCollectionOf(PType<T> ptype, Iterable<T> collect)
public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, S s, T t, Object... more)
public static <S,T> PTable<S,T> typedTableOf(PTableType<S,T> ptype, Iterable<Pair<S,T>> pairs)
public void setConfiguration(org.apache.hadoop.conf.Configuration conf)
Pipeline
Configuration
to use with this pipeline.setConfiguration
in interface Pipeline
public org.apache.hadoop.conf.Configuration getConfiguration()
Pipeline
Configuration
instance associated with this pipeline.getConfiguration
in interface Pipeline
public <T> PCollection<T> read(Source<T> source)
Pipeline
Source
into a PCollection
that is
available to jobs run using this Pipeline
instance.public <T> PCollection<T> read(Source<T> source, String named)
Pipeline
Source
into a PCollection
that is
available to jobs run using this Pipeline
instance.public <K,V> PTable<K,V> read(TableSource<K,V> source)
Pipeline
TableSource
instances that map to
PTable
s.public <K,V> PTable<K,V> read(TableSource<K,V> source, String named)
Pipeline
TableSource
instances that map to
PTable
s.public void write(PCollection<?> collection, Target target)
Pipeline
WriteMode.DEFAULT
rule for the given Target
.public void write(PCollection<?> collection, Target target, Target.WriteMode writeMode)
Pipeline
PCollection
to the given Target
,
using the storage format specified by the target and the given
WriteMode
for cases where the referenced Target
already exists.public PCollection<String> readTextFile(String pathName)
Pipeline
readTextFile
in interface Pipeline
public <T> void writeTextFile(PCollection<T> collection, String pathName)
Pipeline
writeTextFile
in interface Pipeline
public <T> Iterable<T> materialize(PCollection<T> pcollection)
Pipeline
materialize
in interface Pipeline
pcollection
- The PCollection to materializepublic <T> void cache(PCollection<T> pcollection, CachingOptions options)
Pipeline
public <T> PCollection<T> emptyPCollection(PType<T> ptype)
Pipeline
PCollection
of the given PType
.emptyPCollection
in interface Pipeline
ptype
- The PType of the empty PCollectionpublic <K,V> PTable<K,V> emptyPTable(PTableType<K,V> ptype)
Pipeline
PTable
of the given PTable Type
.emptyPTable
in interface Pipeline
ptype
- The PTableType of the empty PTablepublic <T> PCollection<T> create(Iterable<T> contents, PType<T> ptype)
Pipeline
PCollection
containing the values found in the given Iterable
using an implementation-specific distribution mechanism.public <T> PCollection<T> create(Iterable<T> iterable, PType<T> ptype, CreateOptions options)
Pipeline
PCollection
containing the values found in the given Iterable
using an implementation-specific distribution mechanism.public <K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype)
Pipeline
PTable
containing the values found in the given Iterable
using an implementation-specific distribution mechanism.public <K,V> PTable<K,V> create(Iterable<Pair<K,V>> contents, PTableType<K,V> ptype, CreateOptions options)
Pipeline
PTable
containing the values found in the given Iterable
using an implementation-specific distribution mechanism.public <S> PCollection<S> union(List<PCollection<S>> collections)
public <K,V> PTable<K,V> unionTables(List<PTable<K,V>> tables)
unionTables
in interface Pipeline
public <Output> Output sequentialDo(PipelineCallable<Output> callable)
Pipeline
PipelineCallable
on the client after the Targets
that the PipelineCallable depends on (if any) have been created by other pipeline
processing steps.sequentialDo
in interface Pipeline
Output
- The return type of the PipelineCallablecallable
- The sequential logic to executepublic PipelineExecution runAsync()
Pipeline
ListenableFuture
to allow clients to control
job execution.public PipelineResult run()
Pipeline
public void cleanup(boolean force)
Pipeline
running
the pipeline.public PipelineResult done()
Pipeline
run
.public void enableDebug()
Pipeline
enableDebug
in interface Pipeline
Copyright © 2016 The Apache Software Foundation. All rights reserved.