|
|||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use org.apache.crunch | |
---|---|
org.apache.crunch | Client-facing API and core abstractions. |
org.apache.crunch.contrib.bloomfilter | Support for creating Bloom Filters. |
org.apache.crunch.contrib.io.jdbc | Support for reading data from RDBMS using JDBC |
org.apache.crunch.contrib.text | |
org.apache.crunch.examples | Example applications demonstrating various aspects of Crunch. |
org.apache.crunch.fn | Commonly used functions for manipulating collections. |
org.apache.crunch.impl.dist | |
org.apache.crunch.impl.dist.collect | |
org.apache.crunch.impl.mem | In-memory Pipeline implementation for rapid prototyping and testing. |
org.apache.crunch.impl.mr | A Pipeline implementation that runs on Hadoop MapReduce. |
org.apache.crunch.impl.spark | |
org.apache.crunch.impl.spark.collect | |
org.apache.crunch.impl.spark.fn | |
org.apache.crunch.io | Data input and output for Pipelines. |
org.apache.crunch.io.impl | |
org.apache.crunch.lib | Joining, sorting, aggregating, and other commonly used functionality. |
org.apache.crunch.lib.join | Inner and outer joins on collections. |
org.apache.crunch.lib.sort | |
org.apache.crunch.types | Common functionality for business object serialization. |
org.apache.crunch.types.avro | Business object serialization using Apache Avro. |
org.apache.crunch.types.orc | |
org.apache.crunch.types.writable | Business object serialization using Hadoop's Writables framework. |
org.apache.crunch.util | An assorted set of utilities. |
Classes in org.apache.crunch used by org.apache.crunch | |
---|---|
Aggregator
Aggregate a sequence of values into a possibly smaller sequence of the same type. |
|
CachingOptions
Options for controlling how a PCollection<T> is cached for subsequent processing. |
|
CachingOptions.Builder
A Builder class to use for setting the CachingOptions for a PCollection . |
|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
Emitter
Interface for writing outputs from a DoFn . |
|
FilterFn
A DoFn for the common case of filtering the members of a
PCollection based on a boolean condition. |
|
GroupingOptions
Options that can be passed to a groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
|
GroupingOptions.Builder
Builder class for creating GroupingOptions instances. |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
ParallelDoOptions
Container class that includes optional information about a parallelDo operation
applied to a PCollection . |
|
ParallelDoOptions.Builder
|
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PGroupedTable
The Crunch representation of a grouped PTable , which corresponds to the output of
the shuffle phase of a MapReduce job. |
|
Pipeline
Manages the state of a pipeline execution. |
|
PipelineCallable
A specialization of Callable that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection instances that are processed by other jobs in the
pipeline. |
|
PipelineCallable.Status
|
|
PipelineExecution
A handle to allow clients to control a Crunch pipeline as it runs. |
|
PipelineExecution.Status
|
|
PipelineResult
Container for the results of a call to run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
|
PipelineResult.StageResult
|
|
PObject
A PObject represents a singleton object value that results from a distributed
computation. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
ReadableData
Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
SourceTarget
An interface for classes that implement both the Source and the
Target interfaces. |
|
TableSource
The interface Source implementations that return a PTable . |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
|
Target.WriteMode
An enum to represent different options the client may specify for handling the case where the output path, table, etc. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple3.Collect
|
|
Tuple4
A convenience class for four-element Tuple s. |
|
Tuple4.Collect
|
|
TupleN
A Tuple instance for an arbitrary number of values. |
Classes in org.apache.crunch used by org.apache.crunch.contrib.bloomfilter | |
---|---|
DoFn
Base class for all data processing functions in Crunch. |
|
Emitter
Interface for writing outputs from a DoFn . |
|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PObject
A PObject represents a singleton object value that results from a distributed
computation. |
Classes in org.apache.crunch used by org.apache.crunch.contrib.io.jdbc | |
---|---|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
Classes in org.apache.crunch used by org.apache.crunch.contrib.text | |
---|---|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple4
A convenience class for four-element Tuple s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
Classes in org.apache.crunch used by org.apache.crunch.examples | |
---|---|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
Classes in org.apache.crunch used by org.apache.crunch.fn | |
---|---|
Aggregator
Aggregate a sequence of values into a possibly smaller sequence of the same type. |
|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
Emitter
Interface for writing outputs from a DoFn . |
|
FilterFn
A DoFn for the common case of filtering the members of a
PCollection based on a boolean condition. |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple4
A convenience class for four-element Tuple s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
Classes in org.apache.crunch used by org.apache.crunch.impl.dist | |
---|---|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
Pipeline
Manages the state of a pipeline execution. |
|
PipelineCallable
A specialization of Callable that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection instances that are processed by other jobs in the
pipeline. |
|
PipelineResult
Container for the results of a call to run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
SourceTarget
An interface for classes that implement both the Source and the
Target interfaces. |
|
TableSource
The interface Source implementations that return a PTable . |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
|
Target.WriteMode
An enum to represent different options the client may specify for handling the case where the output path, table, etc. |
Classes in org.apache.crunch used by org.apache.crunch.impl.dist.collect | |
---|---|
Aggregator
Aggregate a sequence of values into a possibly smaller sequence of the same type. |
|
CachingOptions
Options for controlling how a PCollection<T> is cached for subsequent processing. |
|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
FilterFn
A DoFn for the common case of filtering the members of a
PCollection based on a boolean condition. |
|
GroupingOptions
Options that can be passed to a groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
ParallelDoOptions
Container class that includes optional information about a parallelDo operation
applied to a PCollection . |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PGroupedTable
The Crunch representation of a grouped PTable , which corresponds to the output of
the shuffle phase of a MapReduce job. |
|
PipelineCallable
A specialization of Callable that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection instances that are processed by other jobs in the
pipeline. |
|
PObject
A PObject represents a singleton object value that results from a distributed
computation. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
ReadableData
Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
SourceTarget
An interface for classes that implement both the Source and the
Target interfaces. |
|
TableSource
The interface Source implementations that return a PTable . |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
|
Target.WriteMode
An enum to represent different options the client may specify for handling the case where the output path, table, etc. |
Classes in org.apache.crunch used by org.apache.crunch.impl.mem | |
---|---|
CachingOptions
Options for controlling how a PCollection<T> is cached for subsequent processing. |
|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
Pipeline
Manages the state of a pipeline execution. |
|
PipelineCallable
A specialization of Callable that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection instances that are processed by other jobs in the
pipeline. |
|
PipelineExecution
A handle to allow clients to control a Crunch pipeline as it runs. |
|
PipelineResult
Container for the results of a call to run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
TableSource
The interface Source implementations that return a PTable . |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
|
Target.WriteMode
An enum to represent different options the client may specify for handling the case where the output path, table, etc. |
Classes in org.apache.crunch used by org.apache.crunch.impl.mr | |
---|---|
CachingOptions
Options for controlling how a PCollection<T> is cached for subsequent processing. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
Pipeline
Manages the state of a pipeline execution. |
|
PipelineExecution
A handle to allow clients to control a Crunch pipeline as it runs. |
|
PipelineResult
Container for the results of a call to run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
Classes in org.apache.crunch used by org.apache.crunch.impl.spark | |
---|---|
CachingOptions
Options for controlling how a PCollection<T> is cached for subsequent processing. |
|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
GroupingOptions
Options that can be passed to a groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
Pipeline
Manages the state of a pipeline execution. |
|
PipelineCallable
A specialization of Callable that executes some sequential logic on the client machine as
part of an overall Crunch pipeline in order to generate zero or more outputs, some of
which may be PCollection instances that are processed by other jobs in the
pipeline. |
|
PipelineExecution
A handle to allow clients to control a Crunch pipeline as it runs. |
|
PipelineExecution.Status
|
|
PipelineResult
Container for the results of a call to run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Classes in org.apache.crunch used by org.apache.crunch.impl.spark.collect | |
---|---|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
GroupingOptions
Options that can be passed to a groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
|
Pair
A convenience class for two-element Tuple s. |
|
ParallelDoOptions
Container class that includes optional information about a parallelDo operation
applied to a PCollection . |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PGroupedTable
The Crunch representation of a grouped PTable , which corresponds to the output of
the shuffle phase of a MapReduce job. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
TableSource
The interface Source implementations that return a PTable . |
Classes in org.apache.crunch used by org.apache.crunch.impl.spark.fn | |
---|---|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
GroupingOptions
Options that can be passed to a groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
Classes in org.apache.crunch used by org.apache.crunch.io | |
---|---|
ReadableData
Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
SourceTarget
An interface for classes that implement both the Source and the
Target interfaces. |
|
TableSource
The interface Source implementations that return a PTable . |
|
TableSourceTarget
An interface for classes that implement both the TableSource and the
Target interfaces. |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
Classes in org.apache.crunch used by org.apache.crunch.io.impl | |
---|---|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
Classes in org.apache.crunch used by org.apache.crunch.lib | |
---|---|
Aggregator
Aggregate a sequence of values into a possibly smaller sequence of the same type. |
|
CombineFn
A special DoFn implementation that converts an Iterable of
values into a single value. |
|
DoFn
Base class for all data processing functions in Crunch. |
|
Emitter
Interface for writing outputs from a DoFn . |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PGroupedTable
The Crunch representation of a grouped PTable , which corresponds to the output of
the shuffle phase of a MapReduce job. |
|
PObject
A PObject represents a singleton object value that results from a distributed
computation. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple3.Collect
|
|
Tuple4
A convenience class for four-element Tuple s. |
|
Tuple4.Collect
|
|
TupleN
A Tuple instance for an arbitrary number of values. |
Classes in org.apache.crunch used by org.apache.crunch.lib.join | |
---|---|
DoFn
Base class for all data processing functions in Crunch. |
|
Emitter
Interface for writing outputs from a DoFn . |
|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
Classes in org.apache.crunch used by org.apache.crunch.lib.sort | |
---|---|
DoFn
Base class for all data processing functions in Crunch. |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
Classes in org.apache.crunch used by org.apache.crunch.types | |
---|---|
DoFn
Base class for all data processing functions in Crunch. |
|
GroupingOptions
Options that can be passed to a groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed. |
|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple4
A convenience class for four-element Tuple s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
|
Union
Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from. |
Classes in org.apache.crunch used by org.apache.crunch.types.avro | |
---|---|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple4
A convenience class for four-element Tuple s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
|
Union
Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from. |
Classes in org.apache.crunch used by org.apache.crunch.types.orc | |
---|---|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
Classes in org.apache.crunch used by org.apache.crunch.types.writable | |
---|---|
MapFn
A DoFn for the common case of emitting exactly one value for each
input record. |
|
Pair
A convenience class for two-element Tuple s. |
|
Tuple
A fixed-size collection of Objects, used in Crunch for representing joins between PCollection s. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple4
A convenience class for four-element Tuple s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
|
Union
Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from. |
Classes in org.apache.crunch used by org.apache.crunch.util | |
---|---|
DoFn
Base class for all data processing functions in Crunch. |
|
Pair
A convenience class for two-element Tuple s. |
|
PCollection
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch. |
|
PipelineExecution
A handle to allow clients to control a Crunch pipeline as it runs. |
|
PipelineResult
Container for the results of a call to run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline. |
|
PTable
A sub-interface of PCollection that represents an immutable,
distributed multi-map of keys and values. |
|
ReadableData
Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline. |
|
Source
A Source represents an input data set that is an input to one or more
MapReduce jobs. |
|
SourceTarget
An interface for classes that implement both the Source and the
Target interfaces. |
|
TableSource
The interface Source implementations that return a PTable . |
|
Target
A Target represents the output destination of a Crunch PCollection
in the context of a Crunch job. |
|
Tuple3
A convenience class for three-element Tuple s. |
|
Tuple4
A convenience class for four-element Tuple s. |
|
TupleN
A Tuple instance for an arbitrary number of values. |
|
|||||||||
PREV NEXT | FRAMES NO FRAMES |