Interface | Description |
---|---|
Aggregator<T> |
Aggregate a sequence of values into a possibly smaller sequence of the same type.
|
Emitter<T> |
Interface for writing outputs from a
DoFn . |
PCollection<S> |
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PGroupedTable<K,V> |
The Crunch representation of a grouped
PTable , which corresponds to the output of
the shuffle phase of a MapReduce job. |
Pipeline |
Manages the state of a pipeline execution.
|
PipelineExecution |
A handle to allow clients to control a Crunch pipeline as it runs.
|
PObject<T> |
A
PObject represents a singleton object value that results from a distributed
computation. |
PTable<K,V> |
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
ReadableData<T> |
Represents the contents of a data source that can be read on the cluster from within one
of the tasks running as part of a Crunch pipeline.
|
Source<T> |
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
SourceTarget<T> |
An interface for classes that implement both the
Source and the
Target interfaces. |
TableSource<K,V> |
The interface
Source implementations that return a PTable . |
TableSourceTarget<K,V> |
An interface for classes that implement both the
TableSource and the
Target interfaces. |
Target |
A
Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Tuple |
A fixed-size collection of Objects, used in Crunch for representing joins
between
PCollection s. |
Class | Description |
---|---|
CachingOptions |
Options for controlling how a
PCollection<T> is cached for subsequent processing. |
CachingOptions.Builder |
A Builder class to use for setting the
CachingOptions for a PCollection . |
CombineFn<S,T> | |
CreateOptions |
Additional options that can be specified when creating a new PCollection using
Pipeline.create(java.lang.Iterable<T>, org.apache.crunch.types.PType<T>) . |
DoFn<S,T> |
Base class for all data processing functions in Crunch.
|
FilterFn<T> |
A
DoFn for the common case of filtering the members of a
PCollection based on a boolean condition. |
GroupingOptions |
Options that can be passed to a
groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
GroupingOptions.Builder |
Builder class for creating
GroupingOptions instances. |
MapFn<S,T> |
A
DoFn for the common case of emitting exactly one value for each
input record. |
Pair<K,V> |
A convenience class for two-element
Tuple s. |
ParallelDoOptions |
Container class that includes optional information about a
parallelDo operation
applied to a PCollection . |
ParallelDoOptions.Builder | |
PipelineCallable<Output> |
A specialization of
Callable that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection instances that are processed by other jobs in the
pipeline. |
PipelineResult |
Container for the results of a call to
run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
PipelineResult.StageResult | |
Tuple3<V1,V2,V3> |
A convenience class for three-element
Tuple s. |
Tuple3.Collect<V1,V2,V3> | |
Tuple4<V1,V2,V3,V4> |
A convenience class for four-element
Tuple s. |
Tuple4.Collect<V1,V2,V3,V4> | |
TupleN |
A
Tuple instance for an arbitrary number of values. |
Union |
Allows us to represent the combination of multiple data sources that may contain different types of data
as a single type with an index to indicate which of the original sources the current record was from.
|
Enum | Description |
---|---|
PipelineCallable.Status | |
PipelineExecution.Status | |
Target.WriteMode |
An enum to represent different options the client may specify
for handling the case where the output path, table, etc.
|
Exception | Description |
---|---|
CrunchRuntimeException |
A
RuntimeException implementation that includes some additional options
for the Crunch execution engine to track reporting status. |
Copyright © 2016 The Apache Software Foundation. All rights reserved.