| Package | Description | 
|---|---|
| org.apache.crunch | Client-facing API and core abstractions. | 
| org.apache.crunch.contrib.bloomfilter | Support for creating Bloom Filters. | 
| org.apache.crunch.contrib.io.jdbc | Support for reading data from RDBMS using JDBC | 
| org.apache.crunch.contrib.text | |
| org.apache.crunch.examples | Example applications demonstrating various aspects of Crunch. | 
| org.apache.crunch.fn | Commonly used functions for manipulating collections. | 
| org.apache.crunch.impl.dist | |
| org.apache.crunch.impl.dist.collect | |
| org.apache.crunch.impl.mem | In-memory Pipeline implementation for rapid prototyping and testing. | 
| org.apache.crunch.impl.mr | A Pipeline implementation that runs on Hadoop MapReduce. | 
| org.apache.crunch.impl.spark | |
| org.apache.crunch.impl.spark.collect | |
| org.apache.crunch.impl.spark.fn | |
| org.apache.crunch.io | Data input and output for Pipelines. | 
| org.apache.crunch.io.impl | |
| org.apache.crunch.kafka | |
| org.apache.crunch.kafka.inputformat | |
| org.apache.crunch.lambda | Alternative Crunch API using Java 8 features to allow construction of pipelines using lambda functions and method
 references. | 
| org.apache.crunch.lib | Joining, sorting, aggregating, and other commonly used functionality. | 
| org.apache.crunch.lib.join | Inner and outer joins on collections. | 
| org.apache.crunch.lib.sort | |
| org.apache.crunch.types | Common functionality for business object serialization. | 
| org.apache.crunch.types.avro | Business object serialization using Apache Avro. | 
| org.apache.crunch.types.orc | |
| org.apache.crunch.types.writable | Business object serialization using Hadoop's Writables framework. | 
| org.apache.crunch.util | An assorted set of utilities. | 
| Class and Description | 
|---|
| Aggregator Aggregate a sequence of values into a possibly smaller sequence of the same type. | 
| CachingOptions Options for controlling how a  PCollection<T>is cached for subsequent processing. | 
| CachingOptions.Builder A Builder class to use for setting the  CachingOptionsfor aPCollection. | 
| CombineFn | 
| CreateOptions Additional options that can be specified when creating a new PCollection using  Pipeline.create(java.lang.Iterable<T>, org.apache.crunch.types.PType<T>). | 
| DoFn Base class for all data processing functions in Crunch. | 
| Emitter Interface for writing outputs from a  DoFn. | 
| FilterFn A  DoFnfor the common case of filtering the members of aPCollectionbased on a boolean condition. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| GroupingOptions.Builder Builder class for creating  GroupingOptionsinstances. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| ParallelDoOptions Container class that includes optional information about a  parallelDooperation
 applied to aPCollection. | 
| ParallelDoOptions.Builder | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PGroupedTable The Crunch representation of a grouped  PTable, which corresponds to the output of
 the shuffle phase of a MapReduce job. | 
| Pipeline Manages the state of a pipeline execution. | 
| PipelineCallable A specialization of  Callablethat executes some sequential logic on the client machine as
 part of an overall Crunch pipeline in order to generate zero or more outputs, some of
 which may bePCollectioninstances that are processed by other jobs in the
 pipeline. | 
| PipelineCallable.Status | 
| PipelineExecution A handle to allow clients to control a Crunch pipeline as it runs. | 
| PipelineExecution.Status | 
| PipelineResult Container for the results of a call to  runordoneon the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline. | 
| PipelineResult.StageResult | 
| PObject A  PObjectrepresents a singleton object value that results from a distributed
 computation. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| ReadableData Represents the contents of a data source that can be read on the cluster from within one
 of the tasks running as part of a Crunch pipeline. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| SourceTarget An interface for classes that implement both the  Sourceand theTargetinterfaces. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Target.WriteMode An enum to represent different options the client may specify
 for handling the case where the output path, table, etc. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple3.Collect | 
| Tuple4 A convenience class for four-element  Tuples. | 
| Tuple4.Collect | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Class and Description | 
|---|
| DoFn Base class for all data processing functions in Crunch. | 
| Emitter Interface for writing outputs from a  DoFn. | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PObject A  PObjectrepresents a singleton object value that results from a distributed
 computation. | 
| Class and Description | 
|---|
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| Class and Description | 
|---|
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple4 A convenience class for four-element  Tuples. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Class and Description | 
|---|
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Class and Description | 
|---|
| Aggregator Aggregate a sequence of values into a possibly smaller sequence of the same type. | 
| CombineFn | 
| DoFn Base class for all data processing functions in Crunch. | 
| Emitter Interface for writing outputs from a  DoFn. | 
| FilterFn A  DoFnfor the common case of filtering the members of aPCollectionbased on a boolean condition. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple4 A convenience class for four-element  Tuples. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Class and Description | 
|---|
| CreateOptions Additional options that can be specified when creating a new PCollection using  Pipeline.create(java.lang.Iterable<T>, org.apache.crunch.types.PType<T>). | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| Pipeline Manages the state of a pipeline execution. | 
| PipelineCallable A specialization of  Callablethat executes some sequential logic on the client machine as
 part of an overall Crunch pipeline in order to generate zero or more outputs, some of
 which may bePCollectioninstances that are processed by other jobs in the
 pipeline. | 
| PipelineResult Container for the results of a call to  runordoneon the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| SourceTarget An interface for classes that implement both the  Sourceand theTargetinterfaces. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Target.WriteMode An enum to represent different options the client may specify
 for handling the case where the output path, table, etc. | 
| Class and Description | 
|---|
| Aggregator Aggregate a sequence of values into a possibly smaller sequence of the same type. | 
| CachingOptions Options for controlling how a  PCollection<T>is cached for subsequent processing. | 
| CombineFn | 
| DoFn Base class for all data processing functions in Crunch. | 
| FilterFn A  DoFnfor the common case of filtering the members of aPCollectionbased on a boolean condition. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| ParallelDoOptions Container class that includes optional information about a  parallelDooperation
 applied to aPCollection. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PGroupedTable The Crunch representation of a grouped  PTable, which corresponds to the output of
 the shuffle phase of a MapReduce job. | 
| PipelineCallable A specialization of  Callablethat executes some sequential logic on the client machine as
 part of an overall Crunch pipeline in order to generate zero or more outputs, some of
 which may bePCollectioninstances that are processed by other jobs in the
 pipeline. | 
| PObject A  PObjectrepresents a singleton object value that results from a distributed
 computation. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| ReadableData Represents the contents of a data source that can be read on the cluster from within one
 of the tasks running as part of a Crunch pipeline. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| SourceTarget An interface for classes that implement both the  Sourceand theTargetinterfaces. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Target.WriteMode An enum to represent different options the client may specify
 for handling the case where the output path, table, etc. | 
| Class and Description | 
|---|
| CachingOptions Options for controlling how a  PCollection<T>is cached for subsequent processing. | 
| CreateOptions Additional options that can be specified when creating a new PCollection using  Pipeline.create(java.lang.Iterable<T>, org.apache.crunch.types.PType<T>). | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| Pipeline Manages the state of a pipeline execution. | 
| PipelineCallable A specialization of  Callablethat executes some sequential logic on the client machine as
 part of an overall Crunch pipeline in order to generate zero or more outputs, some of
 which may bePCollectioninstances that are processed by other jobs in the
 pipeline. | 
| PipelineExecution A handle to allow clients to control a Crunch pipeline as it runs. | 
| PipelineResult Container for the results of a call to  runordoneon the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Target.WriteMode An enum to represent different options the client may specify
 for handling the case where the output path, table, etc. | 
| Class and Description | 
|---|
| CachingOptions Options for controlling how a  PCollection<T>is cached for subsequent processing. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| Pipeline Manages the state of a pipeline execution. | 
| PipelineExecution A handle to allow clients to control a Crunch pipeline as it runs. | 
| PipelineResult Container for the results of a call to  runordoneon the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline. | 
| Class and Description | 
|---|
| CachingOptions Options for controlling how a  PCollection<T>is cached for subsequent processing. | 
| CombineFn | 
| CreateOptions Additional options that can be specified when creating a new PCollection using  Pipeline.create(java.lang.Iterable<T>, org.apache.crunch.types.PType<T>). | 
| DoFn Base class for all data processing functions in Crunch. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| Pipeline Manages the state of a pipeline execution. | 
| PipelineCallable A specialization of  Callablethat executes some sequential logic on the client machine as
 part of an overall Crunch pipeline in order to generate zero or more outputs, some of
 which may bePCollectioninstances that are processed by other jobs in the
 pipeline. | 
| PipelineExecution A handle to allow clients to control a Crunch pipeline as it runs. | 
| PipelineExecution.Status | 
| PipelineResult Container for the results of a call to  runordoneon the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Class and Description | 
|---|
| CombineFn | 
| CreateOptions Additional options that can be specified when creating a new PCollection using  Pipeline.create(java.lang.Iterable<T>, org.apache.crunch.types.PType<T>). | 
| DoFn Base class for all data processing functions in Crunch. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| Pair A convenience class for two-element  Tuples. | 
| ParallelDoOptions Container class that includes optional information about a  parallelDooperation
 applied to aPCollection. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PGroupedTable The Crunch representation of a grouped  PTable, which corresponds to the output of
 the shuffle phase of a MapReduce job. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Class and Description | 
|---|
| CombineFn | 
| DoFn Base class for all data processing functions in Crunch. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| Class and Description | 
|---|
| ReadableData Represents the contents of a data source that can be read on the cluster from within one
 of the tasks running as part of a Crunch pipeline. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| SourceTarget An interface for classes that implement both the  Sourceand theTargetinterfaces. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| TableSourceTarget An interface for classes that implement both the  TableSourceand theTargetinterfaces. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Class and Description | 
|---|
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| Class and Description | 
|---|
| Pair A convenience class for two-element  Tuples. | 
| ReadableData Represents the contents of a data source that can be read on the cluster from within one
 of the tasks running as part of a Crunch pipeline. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Class and Description | 
|---|
| Pair A convenience class for two-element  Tuples. | 
| Class and Description | 
|---|
| Aggregator Aggregate a sequence of values into a possibly smaller sequence of the same type. | 
| CachingOptions Options for controlling how a  PCollection<T>is cached for subsequent processing. | 
| DoFn Base class for all data processing functions in Crunch. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PGroupedTable The Crunch representation of a grouped  PTable, which corresponds to the output of
 the shuffle phase of a MapReduce job. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Target.WriteMode An enum to represent different options the client may specify
 for handling the case where the output path, table, etc. | 
| Class and Description | 
|---|
| Aggregator Aggregate a sequence of values into a possibly smaller sequence of the same type. | 
| CombineFn | 
| DoFn Base class for all data processing functions in Crunch. | 
| Emitter Interface for writing outputs from a  DoFn. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PGroupedTable The Crunch representation of a grouped  PTable, which corresponds to the output of
 the shuffle phase of a MapReduce job. | 
| PObject A  PObjectrepresents a singleton object value that results from a distributed
 computation. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple3.Collect | 
| Tuple4 A convenience class for four-element  Tuples. | 
| Tuple4.Collect | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Class and Description | 
|---|
| DoFn Base class for all data processing functions in Crunch. | 
| Emitter Interface for writing outputs from a  DoFn. | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| Class and Description | 
|---|
| DoFn Base class for all data processing functions in Crunch. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Class and Description | 
|---|
| DoFn Base class for all data processing functions in Crunch. | 
| GroupingOptions Options that can be passed to a  groupByKeyoperation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed. | 
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple4 A convenience class for four-element  Tuples. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Union Allows us to represent the combination of multiple data sources that may contain different types of data
 as a single type with an index to indicate which of the original sources the current record was from. | 
| Class and Description | 
|---|
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple4 A convenience class for four-element  Tuples. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Union Allows us to represent the combination of multiple data sources that may contain different types of data
 as a single type with an index to indicate which of the original sources the current record was from. | 
| Class and Description | 
|---|
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Class and Description | 
|---|
| MapFn A  DoFnfor the common case of emitting exactly one value for each
 input record. | 
| Pair A convenience class for two-element  Tuples. | 
| Tuple A fixed-size collection of Objects, used in Crunch for representing joins
 between  PCollections. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple4 A convenience class for four-element  Tuples. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
| Union Allows us to represent the combination of multiple data sources that may contain different types of data
 as a single type with an index to indicate which of the original sources the current record was from. | 
| Class and Description | 
|---|
| DoFn Base class for all data processing functions in Crunch. | 
| Pair A convenience class for two-element  Tuples. | 
| PCollection A representation of an immutable, distributed collection of elements that is
 the fundamental target of computations in Crunch. | 
| PipelineExecution A handle to allow clients to control a Crunch pipeline as it runs. | 
| PipelineResult Container for the results of a call to  runordoneon the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline. | 
| PTable A sub-interface of  PCollectionthat represents an immutable,
 distributed multi-map of keys and values. | 
| ReadableData Represents the contents of a data source that can be read on the cluster from within one
 of the tasks running as part of a Crunch pipeline. | 
| Source A  Sourcerepresents an input data set that is an input to one or more
 MapReduce jobs. | 
| SourceTarget An interface for classes that implement both the  Sourceand theTargetinterfaces. | 
| TableSource The interface  Sourceimplementations that return aPTable. | 
| Target A  Targetrepresents the output destination of a CrunchPCollectionin the context of a Crunch job. | 
| Tuple3 A convenience class for three-element  Tuples. | 
| Tuple4 A convenience class for four-element  Tuples. | 
| TupleN A  Tupleinstance for an arbitrary number of values. | 
Copyright © 2017 The Apache Software Foundation. All rights reserved.