This project has retired. For details please refer to its Attic page.
Uses of Package org.apache.crunch (Apache Crunch 0.10.0 API)

Uses of Package
org.apache.crunch

Packages that use org.apache.crunch
org.apache.crunch Client-facing API and core abstractions. 
org.apache.crunch.contrib.bloomfilter Support for creating Bloom Filters. 
org.apache.crunch.contrib.io.jdbc Support for reading data from RDBMS using JDBC 
org.apache.crunch.contrib.text   
org.apache.crunch.examples Example applications demonstrating various aspects of Crunch. 
org.apache.crunch.fn Commonly used functions for manipulating collections. 
org.apache.crunch.impl.dist   
org.apache.crunch.impl.dist.collect   
org.apache.crunch.impl.mem In-memory Pipeline implementation for rapid prototyping and testing. 
org.apache.crunch.impl.mr A Pipeline implementation that runs on Hadoop MapReduce. 
org.apache.crunch.impl.spark   
org.apache.crunch.impl.spark.collect   
org.apache.crunch.impl.spark.fn   
org.apache.crunch.io Data input and output for Pipelines. 
org.apache.crunch.io.impl   
org.apache.crunch.lib Joining, sorting, aggregating, and other commonly used functionality. 
org.apache.crunch.lib.join Inner and outer joins on collections. 
org.apache.crunch.lib.sort   
org.apache.crunch.types Common functionality for business object serialization. 
org.apache.crunch.types.avro Business object serialization using Apache Avro. 
org.apache.crunch.types.writable Business object serialization using Hadoop's Writables framework. 
org.apache.crunch.util An assorted set of utilities. 
 

Classes in org.apache.crunch used by org.apache.crunch
Aggregator
          Aggregate a sequence of values into a possibly smaller sequence of the same type.
CachingOptions
          Options for controlling how a PCollection<T> is cached for subsequent processing.
CachingOptions.Builder
          A Builder class to use for setting the CachingOptions for a PCollection.
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
Emitter
          Interface for writing outputs from a DoFn.
FilterFn
          A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.
GroupingOptions
          Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
GroupingOptions.Builder
          Builder class for creating GroupingOptions instances.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
ParallelDoOptions
          Container class that includes optional information about a parallelDo operation applied to a PCollection.
ParallelDoOptions.Builder
           
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PGroupedTable
          The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.
Pipeline
          Manages the state of a pipeline execution.
PipelineExecution
          A handle to allow clients to control a Crunch pipeline as it runs.
PipelineExecution.Status
           
PipelineResult
          Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
PipelineResult.StageResult
           
PObject
          A PObject represents a singleton object value that results from a distributed computation.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
ReadableData
          Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
SourceTarget
          An interface for classes that implement both the Source and the Target interfaces.
TableSource
          The interface Source implementations that return a PTable.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Target.WriteMode
          An enum to represent different options the client may specify for handling the case where the output path, table, etc.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
Tuple3
          A convenience class for three-element Tuples.
Tuple3.Collect
           
Tuple4
          A convenience class for four-element Tuples.
Tuple4.Collect
           
TupleN
          A Tuple instance for an arbitrary number of values.
 

Classes in org.apache.crunch used by org.apache.crunch.contrib.bloomfilter
DoFn
          Base class for all data processing functions in Crunch.
Emitter
          Interface for writing outputs from a DoFn.
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PObject
          A PObject represents a singleton object value that results from a distributed computation.
 

Classes in org.apache.crunch used by org.apache.crunch.contrib.io.jdbc
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
 

Classes in org.apache.crunch used by org.apache.crunch.contrib.text
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
Tuple3
          A convenience class for three-element Tuples.
Tuple4
          A convenience class for four-element Tuples.
TupleN
          A Tuple instance for an arbitrary number of values.
 

Classes in org.apache.crunch used by org.apache.crunch.examples
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
 

Classes in org.apache.crunch used by org.apache.crunch.fn
Aggregator
          Aggregate a sequence of values into a possibly smaller sequence of the same type.
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
Emitter
          Interface for writing outputs from a DoFn.
FilterFn
          A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
Tuple3
          A convenience class for three-element Tuples.
Tuple4
          A convenience class for four-element Tuples.
TupleN
          A Tuple instance for an arbitrary number of values.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.dist
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
Pipeline
          Manages the state of a pipeline execution.
PipelineResult
          Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
SourceTarget
          An interface for classes that implement both the Source and the Target interfaces.
TableSource
          The interface Source implementations that return a PTable.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Target.WriteMode
          An enum to represent different options the client may specify for handling the case where the output path, table, etc.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.dist.collect
Aggregator
          Aggregate a sequence of values into a possibly smaller sequence of the same type.
CachingOptions
          Options for controlling how a PCollection<T> is cached for subsequent processing.
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
FilterFn
          A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.
GroupingOptions
          Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
ParallelDoOptions
          Container class that includes optional information about a parallelDo operation applied to a PCollection.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PGroupedTable
          The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.
PObject
          A PObject represents a singleton object value that results from a distributed computation.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
ReadableData
          Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
SourceTarget
          An interface for classes that implement both the Source and the Target interfaces.
TableSource
          The interface Source implementations that return a PTable.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Target.WriteMode
          An enum to represent different options the client may specify for handling the case where the output path, table, etc.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.mem
CachingOptions
          Options for controlling how a PCollection<T> is cached for subsequent processing.
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
Pipeline
          Manages the state of a pipeline execution.
PipelineExecution
          A handle to allow clients to control a Crunch pipeline as it runs.
PipelineResult
          Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
TableSource
          The interface Source implementations that return a PTable.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Target.WriteMode
          An enum to represent different options the client may specify for handling the case where the output path, table, etc.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.mr
CachingOptions
          Options for controlling how a PCollection<T> is cached for subsequent processing.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
Pipeline
          Manages the state of a pipeline execution.
PipelineExecution
          A handle to allow clients to control a Crunch pipeline as it runs.
PipelineResult
          Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.spark
CachingOptions
          Options for controlling how a PCollection<T> is cached for subsequent processing.
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
GroupingOptions
          Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
Pipeline
          Manages the state of a pipeline execution.
PipelineExecution
          A handle to allow clients to control a Crunch pipeline as it runs.
PipelineExecution.Status
           
PipelineResult
          Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.spark.collect
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
GroupingOptions
          Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
Pair
          A convenience class for two-element Tuples.
ParallelDoOptions
          Container class that includes optional information about a parallelDo operation applied to a PCollection.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PGroupedTable
          The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
TableSource
          The interface Source implementations that return a PTable.
 

Classes in org.apache.crunch used by org.apache.crunch.impl.spark.fn
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
GroupingOptions
          Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
 

Classes in org.apache.crunch used by org.apache.crunch.io
ReadableData
          Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
SourceTarget
          An interface for classes that implement both the Source and the Target interfaces.
TableSource
          The interface Source implementations that return a PTable.
TableSourceTarget
          An interface for classes that implement both the TableSource and the Target interfaces.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
 

Classes in org.apache.crunch used by org.apache.crunch.io.impl
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
 

Classes in org.apache.crunch used by org.apache.crunch.lib
Aggregator
          Aggregate a sequence of values into a possibly smaller sequence of the same type.
CombineFn
          A special DoFn implementation that converts an Iterable of values into a single value.
DoFn
          Base class for all data processing functions in Crunch.
Emitter
          Interface for writing outputs from a DoFn.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PGroupedTable
          The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.
PObject
          A PObject represents a singleton object value that results from a distributed computation.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
Tuple3
          A convenience class for three-element Tuples.
Tuple3.Collect
           
Tuple4
          A convenience class for four-element Tuples.
Tuple4.Collect
           
TupleN
          A Tuple instance for an arbitrary number of values.
 

Classes in org.apache.crunch used by org.apache.crunch.lib.join
DoFn
          Base class for all data processing functions in Crunch.
Emitter
          Interface for writing outputs from a DoFn.
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
 

Classes in org.apache.crunch used by org.apache.crunch.lib.sort
DoFn
          Base class for all data processing functions in Crunch.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
 

Classes in org.apache.crunch used by org.apache.crunch.types
DoFn
          Base class for all data processing functions in Crunch.
GroupingOptions
          Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
Tuple3
          A convenience class for three-element Tuples.
Tuple4
          A convenience class for four-element Tuples.
TupleN
          A Tuple instance for an arbitrary number of values.
Union
          Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from.
 

Classes in org.apache.crunch used by org.apache.crunch.types.avro
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
Tuple3
          A convenience class for three-element Tuples.
Tuple4
          A convenience class for four-element Tuples.
TupleN
          A Tuple instance for an arbitrary number of values.
Union
          Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from.
 

Classes in org.apache.crunch used by org.apache.crunch.types.writable
MapFn
          A DoFn for the common case of emitting exactly one value for each input record.
Pair
          A convenience class for two-element Tuples.
Tuple
          A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
Tuple3
          A convenience class for three-element Tuples.
Tuple4
          A convenience class for four-element Tuples.
TupleN
          A Tuple instance for an arbitrary number of values.
Union
          Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from.
 

Classes in org.apache.crunch used by org.apache.crunch.util
DoFn
          Base class for all data processing functions in Crunch.
Pair
          A convenience class for two-element Tuples.
PCollection
          A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PipelineExecution
          A handle to allow clients to control a Crunch pipeline as it runs.
PipelineResult
          Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
PTable
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
ReadableData
          Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline.
Source
          A Source represents an input data set that is an input to one or more MapReduce jobs.
SourceTarget
          An interface for classes that implement both the Source and the Target interfaces.
TableSource
          The interface Source implementations that return a PTable.
Target
          A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Tuple3
          A convenience class for three-element Tuples.
Tuple4
          A convenience class for four-element Tuples.
TupleN
          A Tuple instance for an arbitrary number of values.
 



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.