Package | Description |
---|---|
org.apache.crunch |
Client-facing API and core abstractions.
|
org.apache.crunch.contrib.bloomfilter |
Support for creating Bloom Filters.
|
org.apache.crunch.contrib.io.jdbc |
Support for reading data from RDBMS using JDBC
|
org.apache.crunch.contrib.text | |
org.apache.crunch.examples |
Example applications demonstrating various aspects of Crunch.
|
org.apache.crunch.fn |
Commonly used functions for manipulating collections.
|
org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
org.apache.crunch.impl.mr |
A Pipeline implementation that runs on Hadoop MapReduce.
|
org.apache.crunch.io |
Data input and output for Pipelines.
|
org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
org.apache.crunch.lib.join |
Inner and outer joins on collections.
|
org.apache.crunch.types |
Common functionality for business object serialization.
|
org.apache.crunch.types.avro |
Business object serialization using Apache Avro.
|
org.apache.crunch.types.writable |
Business object serialization using Hadoop's Writables framework.
|
org.apache.crunch.util |
An assorted set of utilities.
|
Class and Description |
---|
Aggregator
Aggregate a sequence of values into a possibly smaller sequence of the same type.
|
CombineFn |
CombineFn.Aggregator
Deprecated.
Use
Aggregator |
CombineFn.AggregatorFactory
Deprecated.
Use
PGroupedTable.combineValues(Aggregator) which doesn't require a factory. |
CombineFn.SimpleAggregator
Deprecated.
|
DoFn
Base class for all data processing functions in Crunch.
|
Emitter
Interface for writing outputs from a
DoFn . |
FilterFn
A
DoFn for the common case of filtering the members of a
PCollection based on a boolean condition. |
GroupingOptions
Options that can be passed to a
groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
GroupingOptions.Builder
Builder class for creating
GroupingOptions instances. |
MapFn
A
DoFn for the common case of emitting exactly one value for each
input record. |
Pair
A convenience class for two-element
Tuple s. |
ParallelDoOptions
Container class that includes optional information about a
parallelDo operation
applied to a PCollection . |
ParallelDoOptions.Builder |
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PGroupedTable
The Crunch representation of a grouped
PTable . |
Pipeline
Manages the state of a pipeline execution.
|
PipelineResult
Container for the results of a call to
run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
PipelineResult.StageResult |
PObject
A
PObject represents a singleton object value that results from a distributed
computation. |
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Source
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
SourceTarget
An interface for classes that implement both the
Source and the
Target interfaces. |
TableSource
The interface
Source implementations that return a PTable . |
Target
A
Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Target.WriteMode
An enum to represent different options the client may specify
for handling the case where the output path, table, etc.
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins
between
PCollection s. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
DoFn
Base class for all data processing functions in Crunch.
|
Emitter
Interface for writing outputs from a
DoFn . |
Pair
A convenience class for two-element
Tuple s. |
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PObject
A
PObject represents a singleton object value that results from a distributed
computation. |
Class and Description |
---|
Source
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
Class and Description |
---|
Pair
A convenience class for two-element
Tuple s. |
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins
between
PCollection s. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Class and Description |
---|
Aggregator
Aggregate a sequence of values into a possibly smaller sequence of the same type.
|
CombineFn |
DoFn
Base class for all data processing functions in Crunch.
|
Emitter
Interface for writing outputs from a
DoFn . |
FilterFn
A
DoFn for the common case of filtering the members of a
PCollection based on a boolean condition. |
MapFn
A
DoFn for the common case of emitting exactly one value for each
input record. |
Pair
A convenience class for two-element
Tuple s. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
Pair
A convenience class for two-element
Tuple s. |
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
Pipeline
Manages the state of a pipeline execution.
|
PipelineResult
Container for the results of a call to
run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Source
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
TableSource
The interface
Source implementations that return a PTable . |
Target
A
Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Target.WriteMode
An enum to represent different options the client may specify
for handling the case where the output path, table, etc.
|
Class and Description |
---|
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
Pipeline
Manages the state of a pipeline execution.
|
PipelineResult
Container for the results of a call to
run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Source
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
SourceTarget
An interface for classes that implement both the
Source and the
Target interfaces. |
TableSource
The interface
Source implementations that return a PTable . |
Target
A
Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Target.WriteMode
An enum to represent different options the client may specify
for handling the case where the output path, table, etc.
|
Class and Description |
---|
Source
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
SourceTarget
An interface for classes that implement both the
Source and the
Target interfaces. |
TableSource
The interface
Source implementations that return a PTable . |
TableSourceTarget
An interface for classes that implement both the
TableSource and the
Target interfaces. |
Target
A
Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Class and Description |
---|
CombineFn |
DoFn
Base class for all data processing functions in Crunch.
|
Emitter
Interface for writing outputs from a
DoFn . |
Pair
A convenience class for two-element
Tuple s. |
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PObject
A
PObject represents a singleton object value that results from a distributed
computation. |
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
DoFn
Base class for all data processing functions in Crunch.
|
Emitter
Interface for writing outputs from a
DoFn . |
Pair
A convenience class for two-element
Tuple s. |
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Class and Description |
---|
DoFn
Base class for all data processing functions in Crunch.
|
GroupingOptions
Options that can be passed to a
groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
MapFn
A
DoFn for the common case of emitting exactly one value for each
input record. |
Pair
A convenience class for two-element
Tuple s. |
SourceTarget
An interface for classes that implement both the
Source and the
Target interfaces. |
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins
between
PCollection s. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
MapFn
A
DoFn for the common case of emitting exactly one value for each
input record. |
Pair
A convenience class for two-element
Tuple s. |
SourceTarget
An interface for classes that implement both the
Source and the
Target interfaces. |
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins
between
PCollection s. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
MapFn
A
DoFn for the common case of emitting exactly one value for each
input record. |
Pair
A convenience class for two-element
Tuple s. |
SourceTarget
An interface for classes that implement both the
Source and the
Target interfaces. |
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins
between
PCollection s. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Class and Description |
---|
Pair
A convenience class for two-element
Tuple s. |
PCollection
A representation of an immutable, distributed collection of elements that is
the fundamental target of computations in Crunch.
|
PTable
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Source
A
Source represents an input data set that is an input to one or more
MapReduce jobs. |
TableSource
The interface
Source implementations that return a PTable . |
Target
A
Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Tuple3
A convenience class for three-element
Tuple s. |
Tuple4
A convenience class for four-element
Tuple s. |
TupleN
A
Tuple instance for an arbitrary number of values. |
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.