|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.crunch.PipelineCallable<Output>
Output
- the output value returned by this instance (Void, PCollection, Pair<PCollection, PCollection>,
etc.public abstract class PipelineCallable<Output>
A specialization of Callable
that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection
instances that are processed by other jobs in the
pipeline.
PipelineCallable
is intended to be used to inject auxiliary logic into the control
flow of a Crunch pipeline. This can be used for a number of purposes, such as importing or
exporting data into a cluster using Apache Sqoop, executing a legacy MapReduce job
or Pig/Hive script within a Crunch pipeline, or sending emails or status notifications
about the status of a long-running pipeline during its execution.
The Crunch planner needs to know three things about a PipelineCallable
instance in order
to manage it:
Target
and PCollection
instances that must have been materialized
before this instance is allowed to run. This information should be specified via the dependsOn
methods of the class.PCollection
instances that are used as inputs in other Crunch jobs. These outputs should
be specified by the getOutput(Pipeline)
method of the class, which will be executed immediately
after this instance is registered with the Pipeline.sequentialDo(org.apache.crunch.PipelineCallable
method.call
method of the class.If a given PipelineCallable does not have any dependencies, it will be executed before any jobs are run
by the planner. After that, the planner will keep track of when the dependencies of a given instance
have been materialized, and then execute the instance as soon as they all exist. The Crunch planner
uses a thread pool executor to run multiple PipelineCallable
instances simultaneously, but you can
indicate that an instance should be run by itself by overriding the boolean runSingleThreaded()
method
below to return true.
The call
method returns a Status
to indicate whether it succeeded or failed. A failed
instance, or any exceptions/errors thrown by the call method, will cause the overall Crunch pipeline containing
this instance to fail.
A number of helper methods for accessing the dependent Target/PCollection instances that this instance
needs to exist, as well as the Configuration
instance for the overall Pipeline execution, are available
as protected methods in this class so that they may be accessed from implementations of PipelineCallable
within the call
method.
Nested Class Summary | |
---|---|
static class |
PipelineCallable.Status
|
Constructor Summary | |
---|---|
PipelineCallable()
|
Method Summary | |
---|---|
PipelineCallable<Output> |
dependsOn(String label,
PCollection<?> pcollect)
Requires that the given PCollection be materialized to disk before this instance may be
executed. |
PipelineCallable<Output> |
dependsOn(String label,
Target t)
Requires that the given Target exists before this instance may be
executed. |
Output |
generateOutput(Pipeline pipeline)
Called by the Pipeline when this instance is registered with Pipeline#sequentialDo . |
Map<String,PCollection<?>> |
getAllPCollections()
Returns the mapping of labels to PCollection dependencies for this instance. |
Map<String,Target> |
getAllTargets()
Returns the mapping of labels to Target dependencies for this instance. |
String |
getMessage()
Returns a message associated with this callable's execution, especially in case of errors. |
String |
getName()
Returns the name of this instance. |
PipelineCallable<Output> |
named(String name)
Use the given name to identify this instance in the logs. |
boolean |
runSingleThreaded()
Override this method to indicate to the planner that this instance should not be run at the same time as any other PipelineCallable instances. |
void |
setMessage(String message)
Sets a message associated with this callable's execution, especially in case of errors. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface java.util.concurrent.Callable |
---|
call |
Constructor Detail |
---|
public PipelineCallable()
Method Detail |
---|
public boolean runSingleThreaded()
PipelineCallable
instances.
public PipelineCallable<Output> dependsOn(String label, Target t)
Target
exists before this instance may be
executed.
label
- A string that can be used to retrieve the given Target inside of the call
method.t
- the Target
itself
public PipelineCallable<Output> dependsOn(String label, PCollection<?> pcollect)
PCollection
be materialized to disk before this instance may be
executed.
label
- A string that can be used to retrieve the given PCollection inside of the call
method.pcollect
- the PCollection
itself
public Output generateOutput(Pipeline pipeline)
Pipeline
when this instance is registered with Pipeline#sequentialDo
. In general,
clients should override the protected getOutput(Pipeline)
method instead of this one.
public String getName()
public PipelineCallable<Output> named(String name)
public String getMessage()
public void setMessage(String message)
public Map<String,PCollection<?>> getAllPCollections()
public Map<String,Target> getAllTargets()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |