Package org.apache.crunch

Client-facing API and core abstractions.


Interface Summary
Aggregator<T> Aggregate a sequence of values into a possibly smaller sequence of the same type.
Emitter<T> Interface for writing outputs from a DoFn.
PCollection<S> A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PGroupedTable<K,V> The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.
Pipeline Manages the state of a pipeline execution.
PipelineExecution A handle to allow clients to control a Crunch pipeline as it runs.
PObject<T> A PObject represents a singleton object value that results from a distributed computation.
PTable<K,V> A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
ReadableData<T> Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline.
Source<T> A Source represents an input data set that is an input to one or more MapReduce jobs.
SourceTarget<T> An interface for classes that implement both the Source and the Target interfaces.
TableSource<K,V> The interface Source implementations that return a PTable.
TableSourceTarget<K,V> An interface for classes that implement both the TableSource and the Target interfaces.
Target A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Tuple A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.

Class Summary
CachingOptions Options for controlling how a PCollection<T> is cached for subsequent processing.
CachingOptions.Builder A Builder class to use for setting the CachingOptions for a PCollection.
CombineFn<S,T> A special DoFn implementation that converts an Iterable of values into a single value.
DoFn<S,T> Base class for all data processing functions in Crunch.
FilterFn<T> A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.
GroupingOptions Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
GroupingOptions.Builder Builder class for creating GroupingOptions instances.
MapFn<S,T> A DoFn for the common case of emitting exactly one value for each input record.
Pair<K,V> A convenience class for two-element Tuples.
ParallelDoOptions Container class that includes optional information about a parallelDo operation applied to a PCollection.
PipelineResult Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
Tuple3<V1,V2,V3> A convenience class for three-element Tuples.
Tuple4<V1,V2,V3,V4> A convenience class for four-element Tuples.
TupleN A Tuple instance for an arbitrary number of values.
Union Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from.

Enum Summary
Target.WriteMode An enum to represent different options the client may specify for handling the case where the output path, table, etc.

Exception Summary
CrunchRuntimeException A RuntimeException implementation that includes some additional options for the Crunch execution engine to track reporting status.

Package org.apache.crunch Description

Client-facing API and core abstractions.

See Also:
Introduction to Apache Crunch

Copyright © 2014 The Apache Software Foundation. All Rights Reserved.