This project has retired. For details please refer to its Attic page.
Index (Apache Crunch 0.10.0 API)
A B C D E F G H I J K L M N O P Q R S T U V W X Z

A

AbstractCompositeExtractor<T> - Class in org.apache.crunch.contrib.text
Base class for Extractor instances that delegates the parsing of fields to other Extractor instances, primarily used for constructing composite records that implement the Tuple interface.
AbstractCompositeExtractor(TokenizerFactory, List<Extractor<?>>) - Constructor for class org.apache.crunch.contrib.text.AbstractCompositeExtractor
 
AbstractSimpleExtractor<T> - Class in org.apache.crunch.contrib.text
Base class for the common case Extractor instances that construct a single object from a block of text stored in a String, with support for error handling and reporting.
accept(T) - Method in class org.apache.crunch.FilterFn
If true, emit the given record.
accept(PCollectionImpl.Visitor) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
accept(OutputHandler, PType<?>) - Method in interface org.apache.crunch.Target
Checks to see if this Target instance is compatible with the given PType.
ACCEPT_ALL() - Static method in class org.apache.crunch.fn.FilterFns
Accept everything.
addAccumulator(Map<String, Map<String, Long>>, Map<String, Map<String, Long>>) - Method in class org.apache.crunch.impl.spark.CounterAccumulatorParam
 
addCacheFile(Path, Configuration) - Static method in class org.apache.crunch.util.DistCache
 
addInPlace(Map<String, Map<String, Long>>, Map<String, Map<String, Long>>) - Method in class org.apache.crunch.impl.spark.CounterAccumulatorParam
 
addInputPath(Job, Path, FormatBundle, int) - Static method in class org.apache.crunch.io.CrunchInputs
 
addInputPaths(Job, Collection<Path>, FormatBundle, int) - Static method in class org.apache.crunch.io.CrunchInputs
 
addJarDirToDistributedCache(Configuration, File) - Static method in class org.apache.crunch.util.DistCache
Adds all jars under the specified directory to the distributed cache of jobs using the provided configuration.
addJarDirToDistributedCache(Configuration, String) - Static method in class org.apache.crunch.util.DistCache
Adds all jars under the directory at the specified path to the distributed cache of jobs using the provided configuration.
addJarToDistributedCache(Configuration, File) - Static method in class org.apache.crunch.util.DistCache
Adds the specified jar to the distributed cache of jobs using the provided configuration.
addJarToDistributedCache(Configuration, String) - Static method in class org.apache.crunch.util.DistCache
Adds the jar at the specified path to the distributed cache of jobs using the provided configuration.
addNamedOutput(Job, String, Class<? extends OutputFormat>, Class, Class) - Static method in class org.apache.crunch.io.CrunchOutputs
 
addNamedOutput(Job, String, FormatBundle<? extends OutputFormat>, Class, Class) - Static method in class org.apache.crunch.io.CrunchOutputs
 
aggregate(Aggregator<S>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
Aggregate - Class in org.apache.crunch.lib
Methods for performing various types of aggregations over PCollection instances.
Aggregate() - Constructor for class org.apache.crunch.lib.Aggregate
 
aggregate(PCollection<S>, Aggregator<S>) - Static method in class org.apache.crunch.lib.Aggregate
 
aggregate(Aggregator<S>) - Method in interface org.apache.crunch.PCollection
Returns a PCollection that contains the result of aggregating all values in this instance.
Aggregate.PairValueComparator<K,V> - Class in org.apache.crunch.lib
 
Aggregate.PairValueComparator(boolean) - Constructor for class org.apache.crunch.lib.Aggregate.PairValueComparator
 
Aggregate.TopKCombineFn<K,V> - Class in org.apache.crunch.lib
 
Aggregate.TopKCombineFn(int, boolean, PType<Pair<K, V>>) - Constructor for class org.apache.crunch.lib.Aggregate.TopKCombineFn
 
Aggregate.TopKFn<K,V> - Class in org.apache.crunch.lib
 
Aggregate.TopKFn(int, boolean, PType<Pair<K, V>>) - Constructor for class org.apache.crunch.lib.Aggregate.TopKFn
 
Aggregator<T> - Interface in org.apache.crunch
Aggregate a sequence of values into a possibly smaller sequence of the same type.
Aggregators - Class in org.apache.crunch.fn
A collection of pre-defined Aggregators.
Aggregators.SimpleAggregator<T> - Class in org.apache.crunch.fn
Base class for aggregators that do not require any initialization.
Aggregators.SimpleAggregator() - Constructor for class org.apache.crunch.fn.Aggregators.SimpleAggregator
 
and(FilterFn<S>, FilterFn<S>) - Static method in class org.apache.crunch.fn.FilterFns
Accept an entry if all of the given filters accept it, using short-circuit evaluation.
and(FilterFn<S>...) - Static method in class org.apache.crunch.fn.FilterFns
Accept an entry if all of the given filters accept it, using short-circuit evaluation.
apply(Statement, Description) - Method in class org.apache.crunch.test.TemporaryPath
 
applyPTypeTransforms() - Method in interface org.apache.crunch.types.Converter
If true, convert the inputs or outputs from this Converter instance before (for outputs) or after (for inputs) using the associated PType#getInputMapFn and PType#getOutputMapFn calls.
as(PType<T>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
as(PType<T>) - Method in interface org.apache.crunch.types.PTypeFamily
Returns the equivalent of the given ptype for this family, if it exists.
as(PType<T>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
asCollection() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
asCollection() - Method in interface org.apache.crunch.PCollection
 
asMap() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
Returns a PObject encapsulating a Map made up of the keys and values in this PTable.
asMap() - Method in interface org.apache.crunch.PTable
Returns a PObject encapsulating a Map made up of the keys and values in this PTable.
asPTable(PCollection<Pair<K, V>>) - Static method in class org.apache.crunch.lib.PTables
Convert the given PCollection<Pair<K, V>> to a PTable<K, V>.
asReadable(boolean) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
asReadable() - Method in interface org.apache.crunch.io.ReadableSource
 
asReadable(boolean) - Method in interface org.apache.crunch.PCollection
 
asSourceTarget(PType<T>) - Method in interface org.apache.crunch.Target
Attempt to create the SourceTarget type that corresponds to this Target for the given PType, if possible.
At - Class in org.apache.crunch.io
Static factory methods for creating common SourceTarget types, which may be treated as both a Source and a Target.
At() - Constructor for class org.apache.crunch.io.At
 
AverageBytesByIP - Class in org.apache.crunch.examples
 
AverageBytesByIP() - Constructor for class org.apache.crunch.examples.AverageBytesByIP
 
AVRO_MODE_PROPERTY - Static variable in class org.apache.crunch.types.avro.AvroMode
 
AVRO_SHUFFLE_MODE_PROPERTY - Static variable in class org.apache.crunch.types.avro.AvroMode
 
AvroDerivedValueDeepCopier<T,S> - Class in org.apache.crunch.types.avro
A DeepCopier specific to Avro derived types.
AvroDerivedValueDeepCopier(MapFn<T, S>, MapFn<S, T>, AvroType<S>) - Constructor for class org.apache.crunch.types.avro.AvroDerivedValueDeepCopier
 
avroFile(String, Class<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.
avroFile(Path, Class<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.
avroFile(String) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path.
avroFile(Path) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path.
avroFile(Path, Configuration) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance.
avroFile(String, PType<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.
avroFile(Path, PType<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.
avroFile(String, Class<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the Avro file(s) at the given path name.
avroFile(Path, Class<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the Avro file(s) at the given Path.
avroFile(List<Path>, Class<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the Avro file(s) at the given Paths.
avroFile(String, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the Avro file(s) at the given path name.
avroFile(Path, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the Avro file(s) at the given Path.
avroFile(List<Path>, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the Avro file(s) at the given Paths.
avroFile(String) - Static method in class org.apache.crunch.io.From
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path.
avroFile(Path) - Static method in class org.apache.crunch.io.From
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path.
avroFile(List<Path>) - Static method in class org.apache.crunch.io.From
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given paths.
avroFile(Path, Configuration) - Static method in class org.apache.crunch.io.From
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance.
avroFile(List<Path>, Configuration) - Static method in class org.apache.crunch.io.From
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given paths using the FileSystem information contained in the given Configuration instance.
avroFile(String) - Static method in class org.apache.crunch.io.To
Creates a Target at the given path name that writes data to Avro files.
avroFile(Path) - Static method in class org.apache.crunch.io.To
Creates a Target at the given Path that writes data to Avro files.
AvroInputFormat<T> - Class in org.apache.crunch.types.avro
An InputFormat for Avro data files.
AvroInputFormat() - Constructor for class org.apache.crunch.types.avro.AvroInputFormat
 
AvroMode - Class in org.apache.crunch.types.avro
AvroMode is an immutable object used for configuring the reading and writing of Avro types.
AvroMode.ModeType - Enum in org.apache.crunch.types.avro
Internal enum which represents the various Avro data types.
AvroOutputFormat<T> - Class in org.apache.crunch.types.avro
An OutputFormat for Avro data files.
AvroOutputFormat() - Constructor for class org.apache.crunch.types.avro.AvroOutputFormat
 
AvroPathPerKeyOutputFormat<T> - Class in org.apache.crunch.types.avro
A FileOutputFormat that takes in a Utf8 and an Avro record and writes the Avro records to a sub-directory of the output path whose name is equal to the string-form of the Utf8.
AvroPathPerKeyOutputFormat() - Constructor for class org.apache.crunch.types.avro.AvroPathPerKeyOutputFormat
 
Avros - Class in org.apache.crunch.types.avro
Defines static methods that are analogous to the methods defined in AvroTypeFamily for convenient static importing.
AvroSerDe<T> - Class in org.apache.crunch.impl.spark.serde
 
AvroSerDe(AvroType<T>) - Constructor for class org.apache.crunch.impl.spark.serde.AvroSerDe
 
AvroTextOutputFormat<K,V> - Class in org.apache.crunch.types.avro
 
AvroTextOutputFormat() - Constructor for class org.apache.crunch.types.avro.AvroTextOutputFormat
 
AvroType<T> - Class in org.apache.crunch.types.avro
The implementation of the PType interface for Avro-based serialization.
AvroType(Class<T>, Schema, DeepCopier<T>, PType...) - Constructor for class org.apache.crunch.types.avro.AvroType
 
AvroType(Class<T>, Schema, MapFn, MapFn, DeepCopier<T>, AvroType.AvroRecordType, PType...) - Constructor for class org.apache.crunch.types.avro.AvroType
 
AvroType.AvroRecordType - Enum in org.apache.crunch.types.avro
 
AvroTypeFamily - Class in org.apache.crunch.types.avro
 
AvroUtf8InputFormat - Class in org.apache.crunch.types.avro
An InputFormat for text files.
AvroUtf8InputFormat() - Constructor for class org.apache.crunch.types.avro.AvroUtf8InputFormat
 

B

BaseDoCollection<S> - Class in org.apache.crunch.impl.dist.collect
 
BaseDoTable<K,V> - Class in org.apache.crunch.impl.dist.collect
 
BaseGroupedTable<K,V> - Class in org.apache.crunch.impl.dist.collect
 
BaseInputCollection<S> - Class in org.apache.crunch.impl.dist.collect
 
BaseInputCollection(Source<S>, DistributedPipeline) - Constructor for class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
BaseInputTable<K,V> - Class in org.apache.crunch.impl.dist.collect
 
BaseInputTable(TableSource<K, V>, DistributedPipeline) - Constructor for class org.apache.crunch.impl.dist.collect.BaseInputTable
 
BaseUnionCollection<S> - Class in org.apache.crunch.impl.dist.collect
 
BaseUnionTable<K,V> - Class in org.apache.crunch.impl.dist.collect
 
bigInt(PTypeFamily) - Static method in class org.apache.crunch.types.PTypes
A PType for Java's BigInteger type.
BIGINT_TO_BYTE - Static variable in class org.apache.crunch.types.PTypes
 
BloomFilterFactory - Class in org.apache.crunch.contrib.bloomfilter
Factory Class for creating BloomFilters.
BloomFilterFactory() - Constructor for class org.apache.crunch.contrib.bloomfilter.BloomFilterFactory
 
BloomFilterFn<S> - Class in org.apache.crunch.contrib.bloomfilter
The class is responsible for generating keys that are used in a BloomFilter
BloomFilterFn() - Constructor for class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
BloomFilterJoinStrategy<K,U,V> - Class in org.apache.crunch.lib.join
Join strategy that uses a Bloom filter that is trained on the keys of the left-side table to filter the key/value pairs of the right-side table before sending through the shuffle and reduce phase.
BloomFilterJoinStrategy(int) - Constructor for class org.apache.crunch.lib.join.BloomFilterJoinStrategy
Instantiate with the expected number of unique keys in the left table.
BloomFilterJoinStrategy(int, float) - Constructor for class org.apache.crunch.lib.join.BloomFilterJoinStrategy
Instantiate with the expected number of unique keys in the left table, and the acceptable false positive rate for the Bloom filter.
BloomFilterJoinStrategy(int, float, JoinStrategy<K, U, V>) - Constructor for class org.apache.crunch.lib.join.BloomFilterJoinStrategy
Instantiate with the expected number of unique keys in the left table, and the acceptable false positive rate for the Bloom filter, and an underlying join strategy to delegate to.
booleans() - Static method in class org.apache.crunch.types.avro.Avros
 
booleans() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
booleans() - Method in interface org.apache.crunch.types.PTypeFamily
 
booleans() - Static method in class org.apache.crunch.types.writable.Writables
 
booleans() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
bottom(int) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
bottom(int) - Method in interface org.apache.crunch.PTable
Returns a PTable made up of the pairs in this PTable with the smallest value field.
build() - Method in class org.apache.crunch.CachingOptions.Builder
 
build() - Method in class org.apache.crunch.contrib.text.TokenizerFactory.Builder
Returns a new TokenizerFactory with settings determined by this Builder instance.
build() - Method in class org.apache.crunch.GroupingOptions.Builder
 
build() - Method in class org.apache.crunch.ParallelDoOptions.Builder
 
builder() - Static method in class org.apache.crunch.CachingOptions
Creates a new CachingOptions.Builder instance to use for specifying the caching options for a particular PCollection<T>.
builder() - Static method in class org.apache.crunch.contrib.text.TokenizerFactory
Factory method for creating a TokenizerFactory.Builder instance.
builder() - Static method in class org.apache.crunch.GroupingOptions
 
builder() - Static method in class org.apache.crunch.ParallelDoOptions
 
by(MapFn<S, K>, PType<K>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
by(String, MapFn<S, K>, PType<K>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
by(int, Sort.Order) - Static method in class org.apache.crunch.lib.Sort.ColumnOrder
 
by(MapFn<S, K>, PType<K>) - Method in interface org.apache.crunch.PCollection
Apply the given map function to each element of this instance in order to create a PTable.
by(String, MapFn<S, K>, PType<K>) - Method in interface org.apache.crunch.PCollection
Apply the given map function to each element of this instance in order to create a PTable.
BYTE_TO_BIGINT - Static variable in class org.apache.crunch.types.PTypes
 
ByteArray - Class in org.apache.crunch.impl.spark
 
ByteArray(byte[]) - Constructor for class org.apache.crunch.impl.spark.ByteArray
 
bytes() - Static method in class org.apache.crunch.types.avro.Avros
 
bytes() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
bytes() - Method in interface org.apache.crunch.types.PTypeFamily
 
bytes() - Static method in class org.apache.crunch.types.writable.Writables
 
bytes() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
BYTES_IN - Static variable in class org.apache.crunch.types.avro.Avros
 
BYTES_PER_REDUCE_TASK - Static variable in class org.apache.crunch.util.PartitionUtils
 

C

cache() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
cache(CachingOptions) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
cache() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
cache(CachingOptions) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
cache(PCollection<T>, CachingOptions) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
cache(PCollection<T>, CachingOptions) - Method in class org.apache.crunch.impl.mr.MRPipeline
 
cache(PCollection<T>, CachingOptions) - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
cache() - Method in interface org.apache.crunch.PCollection
Marks this data as cached using the default CachingOptions.
cache(CachingOptions) - Method in interface org.apache.crunch.PCollection
Marks this data as cached using the given CachingOptions.
cache(PCollection<T>, CachingOptions) - Method in interface org.apache.crunch.Pipeline
Caches the given PCollection so that it will be processed at most once during pipeline execution.
cache() - Method in interface org.apache.crunch.PTable
 
cache(CachingOptions) - Method in interface org.apache.crunch.PTable
 
CachingOptions - Class in org.apache.crunch
Options for controlling how a PCollection<T> is cached for subsequent processing.
CachingOptions.Builder - Class in org.apache.crunch
A Builder class to use for setting the CachingOptions for a PCollection.
CachingOptions.Builder() - Constructor for class org.apache.crunch.CachingOptions.Builder
 
call(Tuple2<IntByteArray, List<byte[]>>) - Method in class org.apache.crunch.impl.spark.collect.ToByteArrayFunction
 
call(Iterator<Tuple2<K, V>>) - Method in class org.apache.crunch.impl.spark.fn.CombineMapsideFunction
 
call(Iterator<S>) - Method in class org.apache.crunch.impl.spark.fn.FlatMapDoFn
 
call(Iterator<Tuple2<K, V>>) - Method in class org.apache.crunch.impl.spark.fn.FlatMapPairDoFn
 
call(Tuple2<K, V>) - Method in class org.apache.crunch.impl.spark.fn.InputConverterFunction
 
call(Object) - Method in class org.apache.crunch.impl.spark.fn.MapFunction
 
call(Pair<K, V>) - Method in class org.apache.crunch.impl.spark.fn.MapOutputFunction
 
call(S) - Method in class org.apache.crunch.impl.spark.fn.OutputConverterFunction
 
call(Iterator<T>) - Method in class org.apache.crunch.impl.spark.fn.PairFlatMapDoFn
 
call(Iterator<Tuple2<K, V>>) - Method in class org.apache.crunch.impl.spark.fn.PairFlatMapPairDoFn
 
call(Tuple2<K, V>) - Method in class org.apache.crunch.impl.spark.fn.PairMapFunction
 
call(Pair<K, List<V>>) - Method in class org.apache.crunch.impl.spark.fn.PairMapIterableFunction
 
call(Pair<K, V>) - Method in class org.apache.crunch.impl.spark.fn.PartitionedMapOutputFunction
 
call(Iterator<Tuple2<ByteArray, List<byte[]>>>) - Method in class org.apache.crunch.impl.spark.fn.ReduceGroupingFunction
 
call(Tuple2<ByteArray, List<byte[]>>) - Method in class org.apache.crunch.impl.spark.fn.ReduceInputFunction
 
CAN_COMBINE_SPECIFIC_AND_REFLECT_SCHEMAS - Static variable in class org.apache.crunch.types.avro.Avros
Older versions of Avro (i.e., before 1.7.0) do not support schemas that are composed of a mix of specific and reflection-based schemas.
Cartesian - Class in org.apache.crunch.lib
Utilities for Cartesian products of two PTable or PCollection instances.
Cartesian() - Constructor for class org.apache.crunch.lib.Cartesian
 
Channels - Class in org.apache.crunch.lib
Utilities for splitting Pair instances emitted by DoFn into separate PCollection instances.
Channels() - Constructor for class org.apache.crunch.lib.Channels
 
checkCombiningSpecificAndReflectionSchemas() - Static method in class org.apache.crunch.types.avro.Avros
 
cleanup(Emitter<Pair<String, BloomFilter>>) - Method in class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
cleanup(Emitter<T>) - Method in class org.apache.crunch.DoFn
Called during the cleanup of the MapReduce job this DoFn is associated with.
cleanup(Emitter<T>) - Method in class org.apache.crunch.FilterFn
 
cleanup() - Method in class org.apache.crunch.FilterFn
Called during the cleanup of the MapReduce job this FilterFn is associated with.
cleanup(Emitter<T>) - Method in class org.apache.crunch.fn.CompositeMapFn
 
cleanup(Emitter<Pair<S, T>>) - Method in class org.apache.crunch.fn.PairMapFn
 
cleanup(boolean) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
cleanup(boolean) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
cleanup(Emitter<Pair<Integer, Pair<K, V>>>) - Method in class org.apache.crunch.lib.Aggregate.TopKFn
 
cleanup(Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.FullOuterJoinFn
Called during the cleanup of the MapReduce job this DoFn is associated with.
cleanup(Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.LeftOuterJoinFn
Called during the cleanup of the MapReduce job this DoFn is associated with.
cleanup(boolean) - Method in interface org.apache.crunch.Pipeline
Cleans up any artifacts created as a result of running the pipeline.
clear() - Method in class org.apache.crunch.types.writable.TupleWritable
 
clearCounters() - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
clearCounters() - Static method in class org.apache.crunch.test.TestCounters
 
close() - Method in class org.apache.crunch.io.CrunchOutputs
 
cogroup(PTable<K, U>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
Cogroup - Class in org.apache.crunch.lib
 
Cogroup() - Constructor for class org.apache.crunch.lib.Cogroup
 
cogroup(PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups the two PTable arguments.
cogroup(int, PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups the two PTable arguments with a user-specified degree of parallelism (a.k.a, number of reducers.)
cogroup(PTable<K, V1>, PTable<K, V2>, PTable<K, V3>) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups the three PTable arguments.
cogroup(int, PTable<K, V1>, PTable<K, V2>, PTable<K, V3>) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups the three PTable arguments with a user-specified degree of parallelism (a.k.a, number of reducers.)
cogroup(PTable<K, V1>, PTable<K, V2>, PTable<K, V3>, PTable<K, V4>) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups the three PTable arguments.
cogroup(int, PTable<K, V1>, PTable<K, V2>, PTable<K, V3>, PTable<K, V4>) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups the three PTable arguments with a user-specified degree of parallelism (a.k.a, number of reducers.)
cogroup(PTable<K, ?>, PTable<K, ?>...) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups an arbitrary number of PTable arguments.
cogroup(int, PTable<K, ?>, PTable<K, ?>...) - Static method in class org.apache.crunch.lib.Cogroup
Co-groups an arbitrary number of PTable arguments with a user-specified degree of parallelism (a.k.a, number of reducers.) The largest table should come last in the ordering.
cogroup(PTable<K, U>) - Method in interface org.apache.crunch.PTable
Co-group operation with the given table on common keys.
CollectionDeepCopier<T> - Class in org.apache.crunch.types
Performs deep copies (based on underlying PType deep copying) of Collections.
CollectionDeepCopier(PType<T>) - Constructor for class org.apache.crunch.types.CollectionDeepCopier
 
collectionOf(T...) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
collectionOf(Iterable<T>) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
collections(PType<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
collections(PType<T>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
collections(PType<T>) - Method in interface org.apache.crunch.types.PTypeFamily
 
collections(PType<T>) - Static method in class org.apache.crunch.types.writable.Writables
 
collections(PType<T>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
collectValues() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
collectValues(PTable<K, V>) - Static method in class org.apache.crunch.lib.Aggregate
 
collectValues() - Method in interface org.apache.crunch.PTable
Aggregate all of the values with the same key into a single key-value pair in the returned PTable.
column() - Method in class org.apache.crunch.lib.Sort.ColumnOrder
 
CombineFn<S,T> - Class in org.apache.crunch
A special DoFn implementation that converts an Iterable of values into a single value.
CombineFn() - Constructor for class org.apache.crunch.CombineFn
 
CombineMapsideFunction<K,V> - Class in org.apache.crunch.impl.spark.fn
 
CombineMapsideFunction(CombineFn<K, V>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.CombineMapsideFunction
 
combineValues(CombineFn<K, V>, CombineFn<K, V>) - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
combineValues(CombineFn<K, V>) - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
combineValues(Aggregator<V>) - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
combineValues(Aggregator<V>, Aggregator<V>) - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
combineValues(CombineFn<K, V>) - Method in interface org.apache.crunch.PGroupedTable
Combines the values of this grouping using the given CombineFn.
combineValues(CombineFn<K, V>, CombineFn<K, V>) - Method in interface org.apache.crunch.PGroupedTable
Combines and reduces the values of this grouping using the given CombineFn instances.
combineValues(Aggregator<V>) - Method in interface org.apache.crunch.PGroupedTable
Combine the values in each group using the given Aggregator.
combineValues(Aggregator<V>, Aggregator<V>) - Method in interface org.apache.crunch.PGroupedTable
Combine and reduces the values in each group using the given Aggregator instances.
comm(PCollection<T>, PCollection<T>) - Static method in class org.apache.crunch.lib.Set
Find the elements that are common to two sets, like the Unix comm utility.
compare(ByteArray, ByteArray) - Method in class org.apache.crunch.impl.spark.SparkComparator
 
compare(Pair<K, V>, Pair<K, V>) - Method in class org.apache.crunch.lib.Aggregate.PairValueComparator
 
compare(AvroWrapper<T>, AvroWrapper<T>) - Method in class org.apache.crunch.lib.join.JoinUtils.AvroPairGroupingComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.crunch.lib.join.JoinUtils.AvroPairGroupingComparator
 
compare(TupleWritable, TupleWritable) - Method in class org.apache.crunch.lib.join.JoinUtils.TupleWritableComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.crunch.lib.join.JoinUtils.TupleWritableComparator
 
compare(AvroKey<T>, AvroKey<T>) - Method in class org.apache.crunch.lib.sort.ReverseAvroComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.crunch.lib.sort.ReverseAvroComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.crunch.lib.sort.ReverseWritableComparator
 
compare(T, T) - Method in class org.apache.crunch.lib.sort.ReverseWritableComparator
 
compare(WritableComparable, WritableComparable) - Method in class org.apache.crunch.lib.sort.TupleWritableComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.crunch.types.writable.TupleWritable.Comparator
 
compare(WritableComparable, WritableComparable) - Method in class org.apache.crunch.types.writable.TupleWritable.Comparator
 
compareTo(ByteArray) - Method in class org.apache.crunch.impl.spark.ByteArray
 
compareTo(Pair<K, V>) - Method in class org.apache.crunch.Pair
 
compareTo(TupleWritable) - Method in class org.apache.crunch.types.writable.TupleWritable
 
compareTo(UnionWritable) - Method in class org.apache.crunch.types.writable.UnionWritable
 
CompositeMapFn<R,S,T> - Class in org.apache.crunch.fn
 
CompositeMapFn(MapFn<R, S>, MapFn<S, T>) - Constructor for class org.apache.crunch.fn.CompositeMapFn
 
CompositePathIterable<T> - Class in org.apache.crunch.io
 
conf(String, String) - Method in class org.apache.crunch.GroupingOptions.Builder
 
conf(String, String) - Method in class org.apache.crunch.ParallelDoOptions.Builder
Specifies key-value pairs that should be added to the Configuration object associated with the Job that includes these options.
conf(String, String) - Method in interface org.apache.crunch.SourceTarget
Adds the given key-value pair to the Configuration instance(s) that are used to read and write this SourceTarget<T>.
configure(Configuration) - Method in class org.apache.crunch.DoFn
Configure this DoFn.
configure(Configuration) - Method in class org.apache.crunch.fn.CompositeMapFn
 
configure(Configuration) - Method in class org.apache.crunch.fn.ExtractKeyFn
 
configure(Configuration) - Method in class org.apache.crunch.fn.PairMapFn
 
configure(Job) - Method in class org.apache.crunch.GroupingOptions
 
configure(Configuration) - Method in class org.apache.crunch.io.FormatBundle
 
configure(Target, PType<?>) - Method in interface org.apache.crunch.io.OutputHandler
 
configure(Configuration) - Method in class org.apache.crunch.ParallelDoOptions
Applies the key-value pairs that were associated with this instance to the given Configuration object.
configure(Configuration) - Method in interface org.apache.crunch.ReadableData
Allows this instance to specify any additional configuration settings that may be needed by the job that it is launched in.
configure(FormatBundle) - Method in class org.apache.crunch.types.avro.AvroMode
Populates the bundle with mode specific settings for the specific FormatBundle.
configure(Configuration) - Method in class org.apache.crunch.types.avro.AvroMode
Populates the conf with mode specific settings.
configure(Configuration) - Method in class org.apache.crunch.types.avro.AvroUtf8InputFormat
 
configure(Configuration) - Method in class org.apache.crunch.types.PGroupedTableType.PairIterableMapFn
 
configure(Configuration) - Method in class org.apache.crunch.util.DelegatingReadableData
 
configure(Configuration) - Method in class org.apache.crunch.util.UnionReadableData
 
configureFactory(Configuration) - Method in class org.apache.crunch.types.avro.AvroMode
Deprecated. use AvroMode.configure(org.apache.hadoop.conf.Configuration)
configureForMapReduce(Job, PType<?>, Path, String) - Method in interface org.apache.crunch.io.MapReduceTarget
 
configureOrdering(Configuration, WritableType[], Sort.ColumnOrder[]) - Static method in class org.apache.crunch.lib.sort.TupleWritableComparator
 
configureReflectDataFactory(Configuration) - Static method in class org.apache.crunch.types.avro.Avros
Deprecated. as of 0.9.0; use AvroMode.REFLECT.configure(Configuration)
configureShuffle(Configuration) - Method in class org.apache.crunch.types.avro.AvroMode
Populates the conf with mode specific settings for use during the shuffle phase.
configureShuffle(Job, GroupingOptions) - Method in class org.apache.crunch.types.PGroupedTableType
 
configureSource(Job, int) - Method in interface org.apache.crunch.Source
Configure the given job to use this source as an input.
containers(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
containers(Class<T>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
convert(PType<T>, PTypeFamily) - Static method in class org.apache.crunch.types.PTypeUtils
 
Converter<K,V,S,T> - Interface in org.apache.crunch.types
Converts the input key/value from a MapReduce task into the input to a DoFn, or takes the output of a DoFn and write it to the output key/values.
convertInput(K, V) - Method in interface org.apache.crunch.types.Converter
 
convertIterableInput(K, Iterable<V>) - Method in interface org.apache.crunch.types.Converter
 
copyResourceFile(String) - Method in class org.apache.crunch.test.TemporaryPath
Copy a classpath resource to File.
copyResourceFileName(String) - Method in class org.apache.crunch.test.TemporaryPath
Copy a classpath resource returning its absolute file name.
copyResourcePath(String) - Method in class org.apache.crunch.test.TemporaryPath
Copy a classpath resource to a Path.
count() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
count(PCollection<S>) - Static method in class org.apache.crunch.lib.Aggregate
Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.
count(PCollection<S>, int) - Static method in class org.apache.crunch.lib.Aggregate
Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.
count() - Method in interface org.apache.crunch.PCollection
Returns a PTable instance that contains the counts of each unique element of this PCollection.
countClause - Variable in class org.apache.crunch.contrib.io.jdbc.DataBaseSource.Builder
 
CounterAccumulatorParam - Class in org.apache.crunch.impl.spark
 
CounterAccumulatorParam() - Constructor for class org.apache.crunch.impl.spark.CounterAccumulatorParam
 
create(String) - Method in class org.apache.crunch.contrib.text.TokenizerFactory
Return a Scanner instance that wraps the input string and uses the delimiter, skip, and locale settings for this TokenizerFactory instance.
create(FileSystem, Path, FileReaderFactory<S>) - Static method in class org.apache.crunch.io.CompositePathIterable
 
create() - Static method in class org.apache.crunch.lib.join.MapsideJoinStrategy
Create a new MapsideJoinStrategy instance that will load its left-side table into memory, and will materialize the contents of the left-side table to disk before running the in-memory join.
create(boolean) - Static method in class org.apache.crunch.lib.join.MapsideJoinStrategy
Create a new MapsideJoinStrategy instance that will load its left-side table into memory.
create() - Method in class org.apache.crunch.test.TemporaryPath
 
create() - Static method in class org.apache.crunch.types.NoOpDeepCopier
Static factory method.
create(Class<T>, Class...) - Static method in class org.apache.crunch.types.TupleFactory
 
createDoCollection(String, PCollectionImpl<S>, DoFn<S, T>, PType<T>, ParallelDoOptions) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createDoCollection(String, PCollectionImpl<S>, DoFn<S, T>, PType<T>, ParallelDoOptions) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createDoNode() - Method in interface org.apache.crunch.impl.dist.collect.MRCollection
 
createDoTable(String, PCollectionImpl<S>, DoFn<S, Pair<K, V>>, PTableType<K, V>, ParallelDoOptions) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createDoTable(String, PCollectionImpl<S>, CombineFn<K, V>, DoFn<S, Pair<K, V>>, PTableType<K, V>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createDoTable(String, PCollectionImpl<S>, DoFn<S, Pair<K, V>>, PTableType<K, V>, ParallelDoOptions) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createDoTable(String, PCollectionImpl<S>, CombineFn<K, V>, DoFn<S, Pair<K, V>>, PTableType<K, V>) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createFilter(Path, BloomFilterFn<String>) - Static method in class org.apache.crunch.contrib.bloomfilter.BloomFilterFactory
The method will take an input path and generates BloomFilters for all text files in that path.
createFilter(PCollection<T>, BloomFilterFn<T>) - Static method in class org.apache.crunch.contrib.bloomfilter.BloomFilterFactory
 
createGroupedTable(PTableBase<K, V>, GroupingOptions) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createGroupedTable(PTableBase<K, V>, GroupingOptions) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createInputCollection(Source<S>, DistributedPipeline) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createInputCollection(Source<S>, DistributedPipeline) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createInputTable(TableSource<K, V>, DistributedPipeline) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createInputTable(TableSource<K, V>, DistributedPipeline) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createIntermediateOutput(PType<T>) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
createOrderedTupleSchema(PType<S>, Sort.ColumnOrder[]) - Static method in class org.apache.crunch.lib.sort.SortFns
Constructs an Avro schema for the given PType<S> that respects the given column orderings.
createPut(PTable<String, String>) - Method in class org.apache.crunch.examples.WordAggregationHBase
Create puts in order to insert them in hbase.
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.apache.crunch.types.avro.AvroInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.apache.crunch.types.avro.AvroUtf8InputFormat
 
createTempPath() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
createUnionCollection(List<? extends PCollectionImpl<S>>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createUnionCollection(List<? extends PCollectionImpl<S>>) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
createUnionTable(List<PTableBase<K, V>>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionFactory
 
createUnionTable(List<PTableBase<K, V>>) - Method in class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
cross(PTable<K1, U>, PTable<K2, V>) - Static method in class org.apache.crunch.lib.Cartesian
Performs a full cross join on the specified PTables (using the same strategy as Pig's CROSS operator).
cross(PTable<K1, U>, PTable<K2, V>, int) - Static method in class org.apache.crunch.lib.Cartesian
Performs a full cross join on the specified PTables (using the same strategy as Pig's CROSS operator).
cross(PCollection<U>, PCollection<V>) - Static method in class org.apache.crunch.lib.Cartesian
Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
cross(PCollection<U>, PCollection<V>, int) - Static method in class org.apache.crunch.lib.Cartesian
Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
CRUNCH_DISABLE_OUTPUT_COUNTERS - Static variable in class org.apache.crunch.io.CrunchOutputs
 
CRUNCH_FILTER_NAME - Static variable in class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
CRUNCH_FILTER_SIZE - Static variable in class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
CRUNCH_INPUTS - Static variable in class org.apache.crunch.io.CrunchInputs
 
CRUNCH_OUTPUTS - Static variable in class org.apache.crunch.io.CrunchOutputs
 
CrunchInputs - Class in org.apache.crunch.io
Helper functions for configuring multiple InputFormat instances within a single Crunch MapReduce job.
CrunchInputs() - Constructor for class org.apache.crunch.io.CrunchInputs
 
CrunchIterable<S,T> - Class in org.apache.crunch.impl.spark.fn
 
CrunchIterable(DoFn<S, T>, Iterator<S>) - Constructor for class org.apache.crunch.impl.spark.fn.CrunchIterable
 
CrunchOutputs<K,V> - Class in org.apache.crunch.io
An analogue of CrunchInputs for handling multiple OutputFormat instances writing to multiple files within a single MapReduce job.
CrunchOutputs(TaskInputOutputContext<?, ?, K, V>) - Constructor for class org.apache.crunch.io.CrunchOutputs
Creates and initializes multiple outputs support, it should be instantiated in the Mapper/Reducer setup method.
CrunchRuntimeException - Exception in org.apache.crunch
A RuntimeException implementation that includes some additional options for the Crunch execution engine to track reporting status.
CrunchRuntimeException(String) - Constructor for exception org.apache.crunch.CrunchRuntimeException
 
CrunchRuntimeException(Exception) - Constructor for exception org.apache.crunch.CrunchRuntimeException
 
CrunchRuntimeException(String, Exception) - Constructor for exception org.apache.crunch.CrunchRuntimeException
 
CrunchTestSupport - Class in org.apache.crunch.test
A temporary workaround for Scala tests to use when working with Rule annotations until it gets fixed in JUnit 4.11.
CrunchTestSupport() - Constructor for class org.apache.crunch.test.CrunchTestSupport
 
CrunchTool - Class in org.apache.crunch.util
An extension of the Tool interface that creates a Pipeline instance and provides methods for working with the Pipeline from inside of the Tool's run method.
CrunchTool() - Constructor for class org.apache.crunch.util.CrunchTool
 
CrunchTool(boolean) - Constructor for class org.apache.crunch.util.CrunchTool
 

D

DataBaseSource<T extends org.apache.hadoop.mapreduce.lib.db.DBWritable & org.apache.hadoop.io.Writable> - Class in org.apache.crunch.contrib.io.jdbc
Source from reading from a database via a JDBC connection.
DataBaseSource.Builder<T extends org.apache.hadoop.mapreduce.lib.db.DBWritable & org.apache.hadoop.io.Writable> - Class in org.apache.crunch.contrib.io.jdbc
 
DataBaseSource.Builder(Class<T>) - Constructor for class org.apache.crunch.contrib.io.jdbc.DataBaseSource.Builder
 
DebugLogging - Class in org.apache.crunch.test
Allows direct manipulation of the Hadoop log4j settings to aid in unit testing.
DeepCopier<T> - Interface in org.apache.crunch.types
Performs deep copies of values.
deepCopy(Object) - Method in class org.apache.crunch.types.avro.AvroDerivedValueDeepCopier
 
deepCopy(Collection<T>) - Method in class org.apache.crunch.types.CollectionDeepCopier
 
deepCopy(T) - Method in interface org.apache.crunch.types.DeepCopier
Create a deep copy of a value.
deepCopy(Map<String, T>) - Method in class org.apache.crunch.types.MapDeepCopier
 
deepCopy(T) - Method in class org.apache.crunch.types.NoOpDeepCopier
 
deepCopy(T) - Method in class org.apache.crunch.types.TupleDeepCopier
 
deepCopy(Union) - Method in class org.apache.crunch.types.UnionDeepCopier
 
deepCopy(T) - Method in class org.apache.crunch.types.writable.WritableDeepCopier
 
DEFAULT - Static variable in class org.apache.crunch.CachingOptions
An instance of CachingOptions with the default caching settings.
DEFAULT_BYTES_PER_REDUCE_TASK - Static variable in class org.apache.crunch.util.PartitionUtils
 
DEFAULT_MAX_REDUCERS - Static variable in class org.apache.crunch.util.PartitionUtils
 
DEFAULT_PATH - Static variable in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
DefaultJoinStrategy<K,U,V> - Class in org.apache.crunch.lib.join
Default join strategy that simply sends all data through the map, shuffle, and reduce phase.
DefaultJoinStrategy() - Constructor for class org.apache.crunch.lib.join.DefaultJoinStrategy
 
DefaultJoinStrategy(int) - Constructor for class org.apache.crunch.lib.join.DefaultJoinStrategy
 
DelegatingReadableData<S,T> - Class in org.apache.crunch.util
Implements the ReadableData<T> interface by delegating to an ReadableData<S> instance and passing its contents through a DoFn<S, T>.
DelegatingReadableData(ReadableData<S>, DoFn<S, T>) - Constructor for class org.apache.crunch.util.DelegatingReadableData
 
delete() - Method in class org.apache.crunch.test.TemporaryPath
 
delimiter(String) - Method in class org.apache.crunch.contrib.text.TokenizerFactory.Builder
Sets the delimiter used by the TokenizerFactory instances constructed by this instance.
derived(PType<V1>, PType<V2>, PType<V3>) - Static method in class org.apache.crunch.Tuple3.Collect
 
derived(PType<V1>, PType<V2>, PType<V3>, PType<V4>) - Static method in class org.apache.crunch.Tuple4.Collect
 
derived(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Static method in class org.apache.crunch.types.avro.Avros
 
derived(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
derived(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Method in interface org.apache.crunch.types.PTypeFamily
 
derived(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Static method in class org.apache.crunch.types.writable.Writables
 
derived(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
derivedImmutable(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Static method in class org.apache.crunch.types.avro.Avros
 
derivedImmutable(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
derivedImmutable(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Method in interface org.apache.crunch.types.PTypeFamily
A derived type whose values are immutable.
derivedImmutable(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Static method in class org.apache.crunch.types.writable.Writables
 
derivedImmutable(Class<T>, MapFn<S, T>, MapFn<T, S>, PType<S>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
deserialized(boolean) - Method in class org.apache.crunch.CachingOptions.Builder
 
deserialized() - Method in class org.apache.crunch.CachingOptions
Whether the data should remain deserialized in the cache, which trades off CPU processing time for additional storage overhead.
difference(PCollection<T>, PCollection<T>) - Static method in class org.apache.crunch.lib.Set
Compute the set difference between two sets of elements.
disableDeepCopy() - Method in class org.apache.crunch.DoFn
By default, Crunch will do a defensive deep copy of the outputs of a DoFn when there are multiple downstream consumers of that item, in order to prevent the downstream functions from making concurrent modifications to data objects.
DistCache - Class in org.apache.crunch.util
Provides functions for working with Hadoop's distributed cache.
DistCache() - Constructor for class org.apache.crunch.util.DistCache
 
Distinct - Class in org.apache.crunch.lib
Functions for computing the distinct elements of a PCollection.
distinct(PCollection<S>) - Static method in class org.apache.crunch.lib.Distinct
Construct a new PCollection that contains the unique elements of a given input PCollection.
distinct(PTable<K, V>) - Static method in class org.apache.crunch.lib.Distinct
A PTable<K, V> analogue of the distinct function.
distinct(PCollection<S>, int) - Static method in class org.apache.crunch.lib.Distinct
A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.
distinct(PTable<K, V>, int) - Static method in class org.apache.crunch.lib.Distinct
A PTable<K, V> analogue of the distinct function.
DistributedPipeline - Class in org.apache.crunch.impl.dist
 
DistributedPipeline(String, Configuration, PCollectionFactory) - Constructor for class org.apache.crunch.impl.dist.DistributedPipeline
Instantiate with a custom name and configuration.
DoCollection<S> - Class in org.apache.crunch.impl.spark.collect
 
DoFn<S,T> - Class in org.apache.crunch
Base class for all data processing functions in Crunch.
DoFn() - Constructor for class org.apache.crunch.DoFn
 
DoFnIterator<S,T> - Class in org.apache.crunch.util
An Iterator<T> that combines a delegate Iterator<S> and a DoFn<S, T>, generating data by passing the contents of the iterator through the function.
DoFnIterator(Iterator<S>, DoFn<S, T>) - Constructor for class org.apache.crunch.util.DoFnIterator
 
done() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
done() - Method in class org.apache.crunch.impl.mem.MemPipeline
 
done() - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
done() - Method in interface org.apache.crunch.Pipeline
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run.
DONE - Static variable in class org.apache.crunch.PipelineResult
 
done() - Method in class org.apache.crunch.util.CrunchTool
 
DoTable<K,V> - Class in org.apache.crunch.impl.spark.collect
 
doubles() - Static method in class org.apache.crunch.types.avro.Avros
 
doubles() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
doubles() - Method in interface org.apache.crunch.types.PTypeFamily
 
doubles() - Static method in class org.apache.crunch.types.writable.Writables
 
doubles() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
drop(Integer...) - Method in class org.apache.crunch.contrib.text.TokenizerFactory.Builder
Drop the specified fields found by the input scanner, counting from zero.

E

emit(T) - Method in interface org.apache.crunch.Emitter
Write the emitted value to the next stage of the pipeline.
Emitter<T> - Interface in org.apache.crunch
Interface for writing outputs from a DoFn.
EMPTY - Static variable in class org.apache.crunch.PipelineResult
 
EmptyPCollection<T> - Class in org.apache.crunch.impl.dist.collect
 
EmptyPCollection(DistributedPipeline, PType<T>) - Constructor for class org.apache.crunch.impl.dist.collect.EmptyPCollection
 
emptyPCollection(PType<S>) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
emptyPCollection(PType<T>) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
EmptyPCollection<T> - Class in org.apache.crunch.impl.spark.collect
 
EmptyPCollection(DistributedPipeline, PType<T>) - Constructor for class org.apache.crunch.impl.spark.collect.EmptyPCollection
 
emptyPCollection(PType<S>) - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
emptyPCollection(PType<T>) - Method in interface org.apache.crunch.Pipeline
 
EmptyPTable<K,V> - Class in org.apache.crunch.impl.dist.collect
 
EmptyPTable(DistributedPipeline, PTableType<K, V>) - Constructor for class org.apache.crunch.impl.dist.collect.EmptyPTable
 
emptyPTable(PTableType<K, V>) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
emptyPTable(PTableType<K, V>) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
EmptyPTable<K,V> - Class in org.apache.crunch.impl.spark.collect
 
EmptyPTable(DistributedPipeline, PTableType<K, V>) - Constructor for class org.apache.crunch.impl.spark.collect.EmptyPTable
 
emptyPTable(PTableType<K, V>) - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
emptyPTable(PTableType<K, V>) - Method in interface org.apache.crunch.Pipeline
 
enable(Level) - Static method in class org.apache.crunch.test.DebugLogging
Enables logging Hadoop output to the console using the pattern '%-4r [%t] %-5p %c %x - %m%n' at the specified Level.
enable(Level, Appender) - Static method in class org.apache.crunch.test.DebugLogging
Enables logging to the given Appender at the specified Level.
enableDebug() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
enableDebug() - Method in class org.apache.crunch.impl.mem.MemPipeline
 
enableDebug() - Method in interface org.apache.crunch.Pipeline
Turn on debug logging for jobs that are run from this pipeline.
enableDebug() - Method in class org.apache.crunch.util.CrunchTool
 
enums(Class<T>, PTypeFamily) - Static method in class org.apache.crunch.types.PTypes
Constructs a PType for a Java Enum type.
equals(Object) - Method in class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
equals(Object) - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
equals(Object) - Method in class org.apache.crunch.impl.spark.ByteArray
 
equals(Object) - Method in class org.apache.crunch.impl.spark.IntByteArray
 
equals(Object) - Method in class org.apache.crunch.io.FormatBundle
 
equals(Object) - Method in class org.apache.crunch.Pair
 
equals(Object) - Method in class org.apache.crunch.Tuple3
 
equals(Object) - Method in class org.apache.crunch.Tuple4
 
equals(Object) - Method in class org.apache.crunch.TupleN
 
equals(Object) - Method in class org.apache.crunch.types.avro.AvroMode
 
equals(Object) - Method in class org.apache.crunch.types.avro.AvroType
 
equals(Object) - Method in class org.apache.crunch.types.writable.TupleWritable
equals(Object) - Method in class org.apache.crunch.types.writable.WritableType
 
equals(Object) - Method in class org.apache.crunch.Union
 
errorOnLastRecord() - Method in class org.apache.crunch.contrib.text.AbstractCompositeExtractor
 
errorOnLastRecord() - Method in class org.apache.crunch.contrib.text.AbstractSimpleExtractor
 
errorOnLastRecord() - Method in interface org.apache.crunch.contrib.text.Extractor
Returns true if the last call to extract on this instance threw an exception that was handled.
execute() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
extract(String) - Method in class org.apache.crunch.contrib.text.AbstractCompositeExtractor
 
extract(String) - Method in class org.apache.crunch.contrib.text.AbstractSimpleExtractor
 
extract(String) - Method in interface org.apache.crunch.contrib.text.Extractor
Extract a value with the type of this instance.
extractKey(String) - Static method in class org.apache.crunch.types.Protos
 
ExtractKeyFn<K,V> - Class in org.apache.crunch.fn
Wrapper function for converting a MapFn into a key-value pair that is used to convert from a PCollection<V> to a PTable<K, V>.
ExtractKeyFn(MapFn<V, K>) - Constructor for class org.apache.crunch.fn.ExtractKeyFn
 
Extractor<T> - Interface in org.apache.crunch.contrib.text
An interface for extracting a specific data type from a text string that is being processed by a Scanner object.
Extractors - Class in org.apache.crunch.contrib.text
Factory methods for constructing common Extractor types.
Extractors() - Constructor for class org.apache.crunch.contrib.text.Extractors
 
ExtractorStats - Class in org.apache.crunch.contrib.text
Records the number of kind of errors that an Extractor encountered when parsing input data.
ExtractorStats(int) - Constructor for class org.apache.crunch.contrib.text.ExtractorStats
 
ExtractorStats(int, List<Integer>) - Constructor for class org.apache.crunch.contrib.text.ExtractorStats
 
extractText(PTable<ImmutableBytesWritable, Result>) - Method in class org.apache.crunch.examples.WordAggregationHBase
Extract information from hbase

F

FileNamingScheme - Interface in org.apache.crunch.io
Encapsulates rules for naming output files.
FileReaderFactory<T> - Interface in org.apache.crunch.io
 
filter(FilterFn<S>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
filter(String, FilterFn<S>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
filter(FilterFn<Pair<K, V>>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
filter(String, FilterFn<Pair<K, V>>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
filter(FilterFn<S>) - Method in interface org.apache.crunch.PCollection
Apply the given filter function to this instance and return the resulting PCollection.
filter(String, FilterFn<S>) - Method in interface org.apache.crunch.PCollection
Apply the given filter function to this instance and return the resulting PCollection.
filter(FilterFn<Pair<K, V>>) - Method in interface org.apache.crunch.PTable
Apply the given filter function to this instance and return the resulting PTable.
filter(String, FilterFn<Pair<K, V>>) - Method in interface org.apache.crunch.PTable
Apply the given filter function to this instance and return the resulting PTable.
FilterFn<T> - Class in org.apache.crunch
A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.
FilterFn() - Constructor for class org.apache.crunch.FilterFn
 
FilterFns - Class in org.apache.crunch.fn
A collection of pre-defined FilterFn implementations.
findContainingJar(Class<?>) - Static method in class org.apache.crunch.util.DistCache
Finds the path to a jar that contains the class provided, if any.
findCounter(Enum<?>) - Method in class org.apache.crunch.PipelineResult.StageResult
Deprecated. The Counter class changed incompatibly between Hadoop 1 and 2 (from a class to an interface) so user programs should avoid this method and use PipelineResult.StageResult.getCounterValue(Enum) and/or PipelineResult.StageResult.getCounterDisplayName(Enum).
first() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
first() - Method in class org.apache.crunch.Pair
 
first() - Method in interface org.apache.crunch.PCollection
 
first() - Method in class org.apache.crunch.Tuple3
 
first() - Method in class org.apache.crunch.Tuple4
 
FIRST_N(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the first n values (or fewer if there are fewer values than n).
FlatMapDoFn<S,T> - Class in org.apache.crunch.impl.spark.fn
 
FlatMapDoFn(DoFn<S, T>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.FlatMapDoFn
 
FlatMapPairDoFn<K,V,T> - Class in org.apache.crunch.impl.spark.fn
 
FlatMapPairDoFn(DoFn<Pair<K, V>, T>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.FlatMapPairDoFn
 
floats() - Static method in class org.apache.crunch.types.avro.Avros
 
floats() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
floats() - Method in interface org.apache.crunch.types.PTypeFamily
 
floats() - Static method in class org.apache.crunch.types.writable.Writables
 
floats() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
flush() - Method in interface org.apache.crunch.Emitter
Flushes any values cached by this emitter.
forInput(Class<T>) - Static method in class org.apache.crunch.io.FormatBundle
 
FormatBundle<K> - Class in org.apache.crunch.io
A combination of an InputFormat or OutputFormat and any extra configuration information that format class needs to run.
FormatBundle() - Constructor for class org.apache.crunch.io.FormatBundle
 
formattedFile(String, Class<? extends FileInputFormat<K, V>>, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
formattedFile(Path, Class<? extends FileInputFormat<K, V>>, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
formattedFile(List<Path>, Class<? extends FileInputFormat<K, V>>, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
formattedFile(String, Class<? extends FileInputFormat<?, ?>>, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
formattedFile(Path, Class<? extends FileInputFormat<?, ?>>, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
formattedFile(List<Path>, Class<? extends FileInputFormat<?, ?>>, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
formattedFile(String, Class<? extends FileOutputFormat<K, V>>) - Static method in class org.apache.crunch.io.To
Creates a Target at the given path name that writes data to a custom FileOutputFormat.
formattedFile(Path, Class<? extends FileOutputFormat<K, V>>) - Static method in class org.apache.crunch.io.To
Creates a Target at the given Path that writes data to a custom FileOutputFormat.
forOutput(Class<T>) - Static method in class org.apache.crunch.io.FormatBundle
 
fourth() - Method in class org.apache.crunch.Tuple4
 
From - Class in org.apache.crunch.io
Static factory methods for creating common Source types.
From() - Constructor for class org.apache.crunch.io.From
 
fromBytes(byte[]) - Method in class org.apache.crunch.impl.spark.serde.AvroSerDe
 
fromBytes(byte[]) - Method in interface org.apache.crunch.impl.spark.serde.SerDe
 
fromBytes(byte[]) - Method in class org.apache.crunch.impl.spark.serde.WritableSerDe
 
fromBytesFunction() - Method in class org.apache.crunch.impl.spark.serde.AvroSerDe
 
fromBytesFunction() - Method in interface org.apache.crunch.impl.spark.serde.SerDe
 
fromBytesFunction() - Method in class org.apache.crunch.impl.spark.serde.WritableSerDe
 
fromConfiguration(Configuration) - Static method in class org.apache.crunch.types.avro.AvroMode
Creates an AvroMode based on the AvroMode.AVRO_MODE_PROPERTY property in the conf.
fromSerialized(String, Configuration) - Static method in class org.apache.crunch.io.FormatBundle
 
fromShuffleConfiguration(Configuration) - Static method in class org.apache.crunch.types.avro.AvroMode
Creates an AvroMode based on the AvroMode.AVRO_SHUFFLE_MODE_PROPERTY property in the conf.
fromType(AvroType<?>) - Static method in class org.apache.crunch.types.avro.AvroMode
Creates an AvroMode based upon the specified type.
fullJoin(PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Join
Performs a full outer join on the specified PTables.
FullOuterJoinFn<K,U,V> - Class in org.apache.crunch.lib.join
Used to perform the last step of an full outer join.
FullOuterJoinFn(PType<K>, PType<U>) - Constructor for class org.apache.crunch.lib.join.FullOuterJoinFn
 

G

generateKeys(S) - Method in class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
GENERIC - Static variable in class org.apache.crunch.types.avro.AvroMode
Default mode to use for reading and writing Generic types.
generics(Schema) - Static method in class org.apache.crunch.types.avro.Avros
 
generics(Schema) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
get() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
get(long, TimeUnit) - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
get(int) - Method in class org.apache.crunch.Pair
 
get(int) - Method in interface org.apache.crunch.Tuple
Returns the Object at the given index.
get(int) - Method in class org.apache.crunch.Tuple3
 
get(int) - Method in class org.apache.crunch.Tuple4
 
get(int) - Method in class org.apache.crunch.TupleN
 
get(int) - Method in class org.apache.crunch.types.writable.TupleWritable
Get ith Writable from Tuple.
getByFn() - Method in class org.apache.crunch.lib.sort.SortFns.KeyExtraction
 
getCombineFn() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getConf() - Method in class org.apache.crunch.io.FormatBundle
 
getConf() - Method in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
getConf() - Method in class org.apache.crunch.lib.sort.TupleWritableComparator
 
getConf() - Method in class org.apache.crunch.util.CrunchTool
 
getConfiguration() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
getConfiguration() - Method in class org.apache.crunch.impl.mem.MemPipeline
 
getConfiguration() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getConfiguration() - Method in class org.apache.crunch.impl.spark.SparkRuntimeContext
 
getConfiguration() - Method in interface org.apache.crunch.Pipeline
Returns the Configuration instance associated with this pipeline.
getConverter() - Method in interface org.apache.crunch.Source
Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.
getConverter(PType<?>) - Method in interface org.apache.crunch.Target
Returns the Converter to use for mapping from the output PCollection into the output values expected by this instance.
getConverter() - Method in class org.apache.crunch.types.avro.AvroType
 
getConverter() - Method in class org.apache.crunch.types.PGroupedTableType
 
getConverter() - Method in interface org.apache.crunch.types.PType
 
getConverter() - Method in class org.apache.crunch.types.writable.WritableType
 
getCounter(Enum<?>) - Static method in class org.apache.crunch.test.TestCounters
 
getCounter(String, String) - Static method in class org.apache.crunch.test.TestCounters
 
getCounter() - Method in class org.apache.hadoop.mapred.SparkCounter
 
getCounterDisplayName(String, String) - Method in class org.apache.crunch.PipelineResult.StageResult
 
getCounterDisplayName(Enum<?>) - Method in class org.apache.crunch.PipelineResult.StageResult
 
getCounterNames() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getCounters() - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
getCounters() - Method in class org.apache.crunch.PipelineResult.StageResult
Deprecated. The Counter class changed incompatibly between Hadoop 1 and 2 (from a class to an interface) so user programs should avoid this method and use PipelineResult.StageResult.getCounterNames().
getCounterValue(String, String) - Method in class org.apache.crunch.PipelineResult.StageResult
 
getCounterValue(Enum<?>) - Method in class org.apache.crunch.PipelineResult.StageResult
 
getData() - Method in class org.apache.crunch.types.avro.AvroMode
Returns a GenericData instance based on the mode type.
getData() - Method in interface org.apache.crunch.types.avro.ReaderWriterFactory
 
getData() - Method in class org.apache.crunch.types.avro.ReflectDataFactory
 
getDataFileWriter(Path, Configuration) - Static method in class org.apache.crunch.types.avro.AvroOutputFormat
 
getDefaultConfiguration() - Method in class org.apache.crunch.test.TemporaryPath
 
getDefaultFileSource(Path) - Method in class org.apache.crunch.types.avro.AvroType
 
getDefaultFileSource(Path) - Method in class org.apache.crunch.types.PGroupedTableType
 
getDefaultFileSource(Path) - Method in interface org.apache.crunch.types.PType
Returns a SourceTarget that is able to read/write data using the serialization format specified by this PType.
getDefaultFileSource(Path) - Method in class org.apache.crunch.types.writable.WritableType
 
getDefaultInstance() - Static method in class org.apache.crunch.contrib.text.TokenizerFactory
Returns a default TokenizerFactory that uses whitespace as a delimiter and does not skip any input fields.
getDefaultInstance(Class<M>) - Static method in class org.apache.crunch.types.Protos
Utility function for creating a default PB Messgae from a Class object that works with both protoc 2.3.0 and 2.4.x.
getDefaultValue() - Method in class org.apache.crunch.contrib.text.AbstractSimpleExtractor
 
getDefaultValue() - Method in interface org.apache.crunch.contrib.text.Extractor
Returns the default value for this Extractor in case of an error.
getDependentJobs() - Method in interface org.apache.crunch.impl.mr.MRJob
 
getDepth() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getDetachedValue(PTableType<K, V>, Pair<K, V>) - Static method in class org.apache.crunch.lib.PTables
Create a detached value for a table Pair.
getDetachedValue(T) - Method in class org.apache.crunch.types.avro.AvroType
 
getDetachedValue(T) - Method in interface org.apache.crunch.types.PType
Returns a copy of a value (or the value itself) that can safely be retained.
getDetachedValue(T) - Method in class org.apache.crunch.types.writable.WritableType
 
getDisplayName() - Method in class org.apache.hadoop.mapred.SparkCounter
 
getEndTimeMsec() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getErrorCount() - Method in class org.apache.crunch.contrib.text.ExtractorStats
The overall number of records that had some kind of parsing error.
getFactory() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
getFactory() - Method in class org.apache.crunch.types.avro.AvroMode
Returns the factory that will be used for the mode.
getFamily() - Method in class org.apache.crunch.types.avro.AvroType
 
getFamily() - Method in class org.apache.crunch.types.PGroupedTableType
 
getFamily() - Method in interface org.apache.crunch.types.PType
Returns the PTypeFamily that this PType belongs to.
getFamily() - Method in class org.apache.crunch.types.writable.WritableType
 
getFieldErrors() - Method in class org.apache.crunch.contrib.text.ExtractorStats
Returns the number of errors that occurred when parsing the individual fields of a composite record type, like a Pair or TupleN.
getFile(String) - Method in class org.apache.crunch.test.TemporaryPath
Get a File below the temporary directory.
getFileName(String) - Method in class org.apache.crunch.test.TemporaryPath
Get an absolute file name below the temporary directory.
getFileNamingScheme() - Method in interface org.apache.crunch.io.PathTarget
Get the naming scheme to be used for outputs being written to an output path.
getFirst() - Method in class org.apache.crunch.fn.CompositeMapFn
 
getFormatClass() - Method in class org.apache.crunch.io.FormatBundle
 
getFormatNodeMap(JobContext) - Static method in class org.apache.crunch.io.CrunchInputs
 
getGroupedDetachedValue(PGroupedTableType<K, V>, Pair<K, Iterable<V>>) - Static method in class org.apache.crunch.lib.PTables
Created a detached value for a PGroupedTable value.
getGroupedTableType() - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
getGroupedTableType() - Method in interface org.apache.crunch.PGroupedTable
Return the PGroupedTableType containing serialization information for this PGroupedTable.
getGroupedTableType() - Method in interface org.apache.crunch.types.PTableType
Returns the grouped table version of this type.
getGroupingComparator(PTypeFamily) - Static method in class org.apache.crunch.lib.join.JoinUtils
 
getGroupingComparatorClass() - Method in class org.apache.crunch.GroupingOptions
 
getGroupingConverter() - Method in class org.apache.crunch.types.PGroupedTableType
 
getIndex() - Method in class org.apache.crunch.types.writable.UnionWritable
 
getIndex() - Method in class org.apache.crunch.Union
Returns the index of the original data source for this union type.
getInputMapFn() - Method in class org.apache.crunch.types.avro.AvroType
 
getInputMapFn() - Method in interface org.apache.crunch.types.PType
 
getInputMapFn() - Method in class org.apache.crunch.types.writable.WritableType
 
getInstance() - Static method in class org.apache.crunch.fn.IdentityFn
 
getInstance() - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
getInstance() - Static method in class org.apache.crunch.io.SequentialFileNamingScheme
 
getInstance() - Static method in class org.apache.crunch.types.avro.AvroTypeFamily
 
getInstance() - Static method in class org.apache.crunch.types.writable.TupleWritable.Comparator
 
getInstance() - Static method in class org.apache.crunch.types.writable.WritableTypeFamily
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.DoCollection
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.DoTable
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.EmptyPCollection
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.EmptyPTable
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.InputCollection
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.InputTable
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.PGroupedTableImpl
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.UnionCollection
 
getJavaRDDLike(SparkRuntime) - Method in class org.apache.crunch.impl.spark.collect.UnionTable
 
getJavaRDDLike(SparkRuntime) - Method in interface org.apache.crunch.impl.spark.SparkCollection
 
getJob() - Method in interface org.apache.crunch.impl.mr.MRJob
 
getJobEndTimeMsec() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getJobID() - Method in interface org.apache.crunch.impl.mr.MRJob
 
getJobs() - Method in interface org.apache.crunch.impl.mr.MRPipelineExecution
 
getJobStartTimeMsec() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getJobState() - Method in interface org.apache.crunch.impl.mr.MRJob
 
getJoinType() - Method in class org.apache.crunch.lib.join.FullOuterJoinFn
getJoinType() - Method in class org.apache.crunch.lib.join.InnerJoinFn
 
getJoinType() - Method in class org.apache.crunch.lib.join.JoinFn
 
getJoinType() - Method in class org.apache.crunch.lib.join.LeftOuterJoinFn
getJoinType() - Method in class org.apache.crunch.lib.join.RightOuterJoinFn
getKeyClass() - Method in interface org.apache.crunch.types.Converter
 
getKeyType() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
getKeyType() - Method in class org.apache.crunch.lib.sort.SortFns.KeyExtraction
 
getKeyType() - Method in interface org.apache.crunch.PTable
Returns the PType of the key.
getKeyType() - Method in interface org.apache.crunch.types.PTableType
Returns the key type for the table.
getLastModifiedAt(Configuration) - Method in class org.apache.crunch.contrib.io.jdbc.DataBaseSource
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseDoCollection
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseDoTable
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionCollection
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionTable
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.EmptyPCollection
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.EmptyPTable
 
getLastModifiedAt() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
The time of the most recent modification to one of the input sources to the collection.
getLastModifiedAt(FileSystem, Path) - Static method in class org.apache.crunch.io.SourceTargetHelper
 
getLastModifiedAt(Configuration) - Method in interface org.apache.crunch.Source
Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)
getMapOutputName(Configuration, Path) - Method in interface org.apache.crunch.io.FileNamingScheme
Get the output file name for a map task.
getMapOutputName(Configuration, Path) - Method in class org.apache.crunch.io.SequentialFileNamingScheme
 
getMaterializedAt() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getMaterializeSourceTarget(PCollection<T>) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection.
getName() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getName() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
getName() - Method in class org.apache.crunch.impl.mem.MemPipeline
 
getName() - Method in class org.apache.crunch.io.FormatBundle
 
getName() - Method in interface org.apache.crunch.PCollection
Returns a shorthand name for this PCollection.
getName() - Method in interface org.apache.crunch.Pipeline
Returns the name of this pipeline.
getName() - Method in class org.apache.hadoop.mapred.SparkCounter
 
getNextAnonymousStageId() - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
getNumReducers() - Method in class org.apache.crunch.GroupingOptions
 
getNumShards(K) - Method in interface org.apache.crunch.lib.join.ShardedJoinStrategy.ShardingStrategy
Retrieve the number of shards over which the given key should be split.
getOnlyParent() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getOutputMapFn() - Method in class org.apache.crunch.types.avro.AvroType
 
getOutputMapFn() - Method in interface org.apache.crunch.types.PType
 
getOutputMapFn() - Method in class org.apache.crunch.types.writable.WritableType
 
getParallelDoOptions() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseDoCollection
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseDoTable
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionCollection
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionTable
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.EmptyPCollection
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.EmptyPTable
 
getParents() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getPartition(Object) - Method in class org.apache.crunch.impl.spark.SparkPartitioner
 
getPartition(Object, Object, int) - Method in class org.apache.crunch.lib.join.JoinUtils.AvroIndexedRecordPartitioner
 
getPartition(TupleWritable, Writable, int) - Method in class org.apache.crunch.lib.join.JoinUtils.TupleWritablePartitioner
 
getPartition(K, V, int) - Method in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
getPartitionerClass() - Method in class org.apache.crunch.GroupingOptions
 
getPartitionerClass(PTypeFamily) - Static method in class org.apache.crunch.lib.join.JoinUtils
 
getPartitionFile(Configuration) - Static method in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
getPath() - Method in interface org.apache.crunch.io.PathTarget
 
getPath(String) - Method in class org.apache.crunch.test.TemporaryPath
Get a Path below the temporary directory.
getPathSize(Configuration, Path) - Static method in class org.apache.crunch.io.SourceTargetHelper
 
getPathSize(FileSystem, Path) - Static method in class org.apache.crunch.io.SourceTargetHelper
 
getPathToCacheFile(Path, Configuration) - Static method in class org.apache.crunch.util.DistCache
 
getPipeline() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getPipeline() - Method in interface org.apache.crunch.PCollection
Returns the Pipeline associated with this PCollection.
getPlanDotFile() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getPlanDotFile() - Method in interface org.apache.crunch.PipelineExecution
Returns the .dot file that allows a client to graph the Crunch execution plan for this pipeline.
getPrimitiveType(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
getPrimitiveType(Class<T>) - Static method in class org.apache.crunch.types.writable.Writables
 
getPTableType() - Method in class org.apache.crunch.impl.dist.collect.BaseDoTable
 
getPTableType() - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
getPTableType() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionTable
 
getPTableType() - Method in class org.apache.crunch.impl.dist.collect.EmptyPTable
 
getPTableType() - Method in interface org.apache.crunch.PTable
Returns the PTableType of this PTable.
getPType(PTypeFamily) - Method in interface org.apache.crunch.contrib.text.Extractor
Returns the PType associated with this data type for the given PTypeFamily.
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseDoCollection
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseDoTable
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionCollection
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionTable
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.EmptyPCollection
 
getPType() - Method in class org.apache.crunch.impl.dist.collect.EmptyPTable
 
getPType() - Method in interface org.apache.crunch.PCollection
Returns the PType of this PCollection.
getReader(Schema) - Method in class org.apache.crunch.types.avro.AvroMode
Creates a DatumReader based on the schema.
getReader(Schema) - Method in interface org.apache.crunch.types.avro.ReaderWriterFactory
 
getReader(Schema) - Method in class org.apache.crunch.types.avro.ReflectDataFactory
 
getRecommendedPartitions(PCollection<T>) - Static method in class org.apache.crunch.util.PartitionUtils
 
getRecommendedPartitions(PCollection<T>, Configuration) - Static method in class org.apache.crunch.util.PartitionUtils
 
getRecordType() - Method in class org.apache.crunch.types.avro.AvroType
 
getRecordWriter(TaskAttemptContext) - Method in class org.apache.crunch.types.avro.AvroOutputFormat
 
getRecordWriter(TaskAttemptContext) - Method in class org.apache.crunch.types.avro.AvroPathPerKeyOutputFormat
 
getRecordWriter(TaskAttemptContext) - Method in class org.apache.crunch.types.avro.AvroTextOutputFormat
 
getReduceOutputName(Configuration, Path, int) - Method in interface org.apache.crunch.io.FileNamingScheme
Get the output file name for a reduce task.
getReduceOutputName(Configuration, Path, int) - Method in class org.apache.crunch.io.SequentialFileNamingScheme
 
getReflectDataFactory(Configuration) - Static method in class org.apache.crunch.types.avro.Avros
Deprecated. as of 0.9.0; use AvroMode.fromConfiguration(conf)
getResult() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getResult() - Method in interface org.apache.crunch.PipelineExecution
Retrieve the result of a pipeline if it has been completed, otherwise null.
getRootFile() - Method in class org.apache.crunch.test.TemporaryPath
Get the root directory which will be deleted automatically.
getRootFileName() - Method in class org.apache.crunch.test.TemporaryPath
Get the root directory as an absolute file name.
getRootPath() - Method in class org.apache.crunch.test.TemporaryPath
Get the root directory as a Path.
getRuntimeContext() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getSchema() - Method in class org.apache.crunch.types.avro.AvroType
 
getSecond() - Method in class org.apache.crunch.fn.CompositeMapFn
 
getSerializationClass() - Method in class org.apache.crunch.types.writable.WritableType
 
getSize(Configuration) - Method in class org.apache.crunch.contrib.io.jdbc.DataBaseSource
 
getSize() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getSize() - Method in interface org.apache.crunch.PCollection
Returns the size of the data represented by this PCollection in bytes.
getSize(Configuration) - Method in interface org.apache.crunch.Source
Returns the number of bytes in this Source.
getSortComparatorClass() - Method in class org.apache.crunch.GroupingOptions
 
getSource() - Method in class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
getSource() - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
getSourceTargets() - Method in class org.apache.crunch.GroupingOptions
 
getSourceTargets() - Method in class org.apache.crunch.ParallelDoOptions
 
getSourceTargets() - Method in interface org.apache.crunch.ReadableData
 
getSourceTargets() - Method in class org.apache.crunch.util.DelegatingReadableData
 
getSourceTargets() - Method in class org.apache.crunch.util.UnionReadableData
 
getSparkContext() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getStageId() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getStageName() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getStageResults() - Method in class org.apache.crunch.PipelineResult
 
getStartTimeMsec() - Method in class org.apache.crunch.PipelineResult.StageResult
 
getStats() - Method in class org.apache.crunch.contrib.text.AbstractCompositeExtractor
 
getStats() - Method in class org.apache.crunch.contrib.text.AbstractSimpleExtractor
 
getStats() - Method in interface org.apache.crunch.contrib.text.Extractor
Return statistics about how many errors this Extractor instance encountered while parsing input data.
getStatus() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getStatus() - Method in interface org.apache.crunch.PipelineExecution
 
getStorageLevel(PCollection<?>) - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
getSubTypes() - Method in class org.apache.crunch.types.avro.AvroType
 
getSubTypes() - Method in class org.apache.crunch.types.PGroupedTableType
 
getSubTypes() - Method in interface org.apache.crunch.types.PType
Returns the sub-types that make up this PType if it is a composite instance, such as a tuple.
getSubTypes() - Method in class org.apache.crunch.types.writable.WritableType
 
getTableType() - Method in interface org.apache.crunch.TableSource
 
getTableType() - Method in class org.apache.crunch.types.PGroupedTableType
 
getTargetDependencies() - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
getTargetDependencies() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getTestContext(Configuration) - Static method in class org.apache.crunch.test.CrunchTestSupport
The method creates a TaskInputOutputContext which can be used in unit tests.
getTupleFactory(Class<T>) - Static method in class org.apache.crunch.types.TupleFactory
Get the TupleFactory for a given Tuple implementation.
getType() - Method in interface org.apache.crunch.Source
Returns the PType for this source.
getTypeClass() - Method in class org.apache.crunch.types.avro.AvroType
 
getTypeClass() - Method in interface org.apache.crunch.types.PType
Returns the Java type represented by this PType.
getTypeClass() - Method in class org.apache.crunch.types.writable.WritableType
 
getTypeFamily() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
getTypeFamily() - Method in interface org.apache.crunch.PCollection
Returns the PTypeFamily of this PCollection.
getValue() - Method in interface org.apache.crunch.PObject
Gets the value associated with this PObject.
getValue() - Method in class org.apache.crunch.types.writable.UnionWritable
 
getValue() - Method in class org.apache.crunch.Union
Returns the underlying object value of the record.
getValue() - Method in class org.apache.hadoop.mapred.SparkCounter
 
getValueClass() - Method in interface org.apache.crunch.types.Converter
 
getValueType() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
getValueType() - Method in interface org.apache.crunch.PTable
Returns the PType of the value.
getValueType() - Method in interface org.apache.crunch.types.PTableType
Returns the value type for the table.
getWriter(Schema) - Method in class org.apache.crunch.types.avro.AvroMode
Creates a DatumWriter based on the schema.
getWriter(Schema) - Method in interface org.apache.crunch.types.avro.ReaderWriterFactory
 
getWriter(Schema) - Method in class org.apache.crunch.types.avro.ReflectDataFactory
 
groupByKey() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
groupByKey(int) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
groupByKey(GroupingOptions) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
groupByKey() - Method in interface org.apache.crunch.PTable
Performs a grouping operation on the keys of this table.
groupByKey(int) - Method in interface org.apache.crunch.PTable
Performs a grouping operation on the keys of this table, using the given number of partitions.
groupByKey(GroupingOptions) - Method in interface org.apache.crunch.PTable
Performs a grouping operation on the keys of this table, using the additional GroupingOptions to control how the grouping is executed.
groupedWeightedReservoirSample(PTable<Integer, Pair<T, N>>, int[]) - Static method in class org.apache.crunch.lib.Sample
The most general purpose of the weighted reservoir sampling patterns that allows us to choose a random sample of elements for each of N input groups.
groupedWeightedReservoirSample(PTable<Integer, Pair<T, N>>, int[], Long) - Static method in class org.apache.crunch.lib.Sample
Same as the other groupedWeightedReservoirSample method, but include a seed for testing purposes.
groupingComparatorClass(Class<? extends RawComparator>) - Method in class org.apache.crunch.GroupingOptions.Builder
 
GroupingOptions - Class in org.apache.crunch
Options that can be passed to a groupByKey operation in order to exercise finer control over how the partitioning, grouping, and sorting of keys is performed.
GroupingOptions.Builder - Class in org.apache.crunch
Builder class for creating GroupingOptions instances.
GroupingOptions.Builder() - Constructor for class org.apache.crunch.GroupingOptions.Builder
 
GuavaUtils - Class in org.apache.crunch.impl.spark
 
GuavaUtils() - Constructor for class org.apache.crunch.impl.spark.GuavaUtils
 

H

handleExisting(Target.WriteMode, long, Configuration) - Method in interface org.apache.crunch.Target
Apply the given WriteMode to this Target instance.
handleOutputs(Configuration, Path, int) - Method in interface org.apache.crunch.io.PathTarget
Handles moving the output data for this target from a temporary location on the filesystem to its target path at the end of a MapReduce job.
has(int) - Method in class org.apache.crunch.types.writable.TupleWritable
Return true if tuple has an element at the position provided.
hashCode() - Method in class org.apache.crunch.impl.dist.collect.BaseInputCollection
 
hashCode() - Method in class org.apache.crunch.impl.dist.collect.BaseInputTable
 
hashCode() - Method in class org.apache.crunch.impl.spark.ByteArray
 
hashCode() - Method in class org.apache.crunch.impl.spark.IntByteArray
 
hashCode() - Method in class org.apache.crunch.io.FormatBundle
 
hashCode() - Method in class org.apache.crunch.Pair
 
hashCode() - Method in class org.apache.crunch.Tuple3
 
hashCode() - Method in class org.apache.crunch.Tuple4
 
hashCode() - Method in class org.apache.crunch.TupleN
 
hashCode() - Method in class org.apache.crunch.types.avro.AvroMode
 
hashCode() - Method in class org.apache.crunch.types.avro.AvroType
 
hashCode() - Method in class org.apache.crunch.types.writable.TupleWritable
 
hashCode() - Method in class org.apache.crunch.types.writable.WritableType
 
hashCode() - Method in class org.apache.crunch.Union
 
hasNext() - Method in class org.apache.crunch.contrib.text.Tokenizer
Returns true if the underlying Scanner has any tokens remaining.
hasNext() - Method in class org.apache.crunch.util.DoFnIterator
 
hasReflect() - Method in class org.apache.crunch.types.avro.AvroType
Determine if the wrapped type is a reflection-based avro type or wraps one.
hasSpecific() - Method in class org.apache.crunch.types.avro.AvroType
Determine if the wrapped type is a specific data avro type or wraps one.

I

id - Variable in class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
IdentifiableName - Class in org.apache.crunch.contrib.io.jdbc
 
IdentifiableName() - Constructor for class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
IdentityFn<T> - Class in org.apache.crunch.fn
 
immutableType(Class<T>, Class<W>, MapFn<W, T>, MapFn<T, W>, PType...) - Static method in class org.apache.crunch.types.writable.WritableType
Factory method for a new WritableType instance whose type class is immutable.
increment(long) - Method in class org.apache.hadoop.mapred.SparkCounter
 
initialize(Configuration) - Method in interface org.apache.crunch.Aggregator
Perform any setup of this instance that is required prior to processing inputs.
initialize() - Method in class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
initialize() - Method in class org.apache.crunch.contrib.text.AbstractCompositeExtractor
 
initialize() - Method in class org.apache.crunch.contrib.text.AbstractSimpleExtractor
 
initialize() - Method in interface org.apache.crunch.contrib.text.Extractor
Perform any initialization required by this Extractor during the start of a map or reduce task.
initialize() - Method in class org.apache.crunch.DoFn
Initialize this DoFn.
initialize(Configuration) - Method in class org.apache.crunch.fn.Aggregators.SimpleAggregator
 
initialize() - Method in class org.apache.crunch.fn.CompositeMapFn
 
initialize() - Method in class org.apache.crunch.fn.ExtractKeyFn
 
initialize() - Method in class org.apache.crunch.fn.PairMapFn
 
initialize(DoFn<?, ?>) - Method in class org.apache.crunch.impl.spark.SparkRuntimeContext
 
initialize() - Method in class org.apache.crunch.lib.Aggregate.TopKCombineFn
 
initialize() - Method in class org.apache.crunch.lib.Aggregate.TopKFn
 
initialize() - Method in class org.apache.crunch.lib.join.FullOuterJoinFn
Initialize this DoFn.
initialize() - Method in class org.apache.crunch.lib.join.InnerJoinFn
 
initialize() - Method in class org.apache.crunch.lib.join.JoinFn
 
initialize() - Method in class org.apache.crunch.lib.join.LeftOuterJoinFn
Initialize this DoFn.
initialize() - Method in class org.apache.crunch.lib.join.RightOuterJoinFn
Initialize this DoFn.
initialize() - Method in class org.apache.crunch.lib.sort.SortFns.AvroGenericFn
 
initialize(Configuration) - Method in class org.apache.crunch.types.avro.AvroDerivedValueDeepCopier
 
initialize(Configuration) - Method in class org.apache.crunch.types.avro.AvroType
 
initialize(Configuration) - Method in class org.apache.crunch.types.CollectionDeepCopier
 
initialize(Configuration) - Method in interface org.apache.crunch.types.DeepCopier
Initialize the deep copier with a job-specific configuration
initialize(Configuration) - Method in class org.apache.crunch.types.MapDeepCopier
 
initialize(Configuration) - Method in class org.apache.crunch.types.NoOpDeepCopier
 
initialize() - Method in class org.apache.crunch.types.PGroupedTableType.PairIterableMapFn
 
initialize(Configuration) - Method in interface org.apache.crunch.types.PType
Initialize this PType for use within a DoFn.
initialize(Configuration) - Method in class org.apache.crunch.types.TupleDeepCopier
 
initialize() - Method in class org.apache.crunch.types.TupleFactory
 
initialize(Configuration) - Method in class org.apache.crunch.types.UnionDeepCopier
 
initialize(Configuration) - Method in class org.apache.crunch.types.writable.WritableDeepCopier
 
initialize(Configuration) - Method in class org.apache.crunch.types.writable.WritableType
 
innerJoin(PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Join
Performs an inner join on the specified PTables.
InnerJoinFn<K,U,V> - Class in org.apache.crunch.lib.join
Used to perform the last step of an inner join.
InnerJoinFn(PType<K>, PType<U>) - Constructor for class org.apache.crunch.lib.join.InnerJoinFn
 
InputCollection<S> - Class in org.apache.crunch.impl.spark.collect
 
inputConf(String, String) - Method in interface org.apache.crunch.Source
Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>.
InputConverterFunction<K,V,S> - Class in org.apache.crunch.impl.spark.fn
 
InputConverterFunction(Converter<K, V, S, ?>) - Constructor for class org.apache.crunch.impl.spark.fn.InputConverterFunction
 
InputTable<K,V> - Class in org.apache.crunch.impl.spark.collect
 
InputTable(TableSource<K, V>, DistributedPipeline) - Constructor for class org.apache.crunch.impl.spark.collect.InputTable
 
IntByteArray - Class in org.apache.crunch.impl.spark
 
IntByteArray(int, byte[]) - Constructor for class org.apache.crunch.impl.spark.IntByteArray
 
intersection(PCollection<T>, PCollection<T>) - Static method in class org.apache.crunch.lib.Set
Compute the intersection of two sets of elements.
ints() - Static method in class org.apache.crunch.types.avro.Avros
 
ints() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
ints() - Method in interface org.apache.crunch.types.PTypeFamily
 
ints() - Static method in class org.apache.crunch.types.writable.Writables
 
ints() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
isBreakpoint() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
isCompatibleWith(GroupingOptions) - Method in class org.apache.crunch.GroupingOptions
 
isGeneric() - Method in class org.apache.crunch.types.avro.AvroType
Determine if the wrapped type is a generic data avro type.
isValid(JavaRDDLike<?, ?>) - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
iterator() - Method in class org.apache.crunch.impl.SingleUseIterable
 
iterator() - Method in class org.apache.crunch.impl.spark.fn.CrunchIterable
 
iterator() - Method in class org.apache.crunch.io.CompositePathIterable
 
iterator() - Method in class org.apache.crunch.util.Tuples.PairIterable
 
iterator() - Method in class org.apache.crunch.util.Tuples.QuadIterable
 
iterator() - Method in class org.apache.crunch.util.Tuples.TripIterable
 
iterator() - Method in class org.apache.crunch.util.Tuples.TupleNIterable
 

J

join(PTable<K, U>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
Join - Class in org.apache.crunch.lib
Utilities for joining multiple PTable instances based on a common lastKey.
Join() - Constructor for class org.apache.crunch.lib.Join
 
join(PTable<K, U>, PTable<K, V>, JoinType) - Method in class org.apache.crunch.lib.join.BloomFilterJoinStrategy
 
join(PTable<K, U>, PTable<K, V>, JoinType) - Method in class org.apache.crunch.lib.join.DefaultJoinStrategy
 
join(PTable<K, U>, PTable<K, V>, JoinFn<K, U, V>) - Method in class org.apache.crunch.lib.join.DefaultJoinStrategy
Perform a default join on the given PTable instances using a user-specified JoinFn.
join(K, int, Iterable<Pair<U, V>>, Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.FullOuterJoinFn
Performs the actual joining.
join(K, int, Iterable<Pair<U, V>>, Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.InnerJoinFn
 
join(PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Join
Performs an inner join on the specified PTables.
join(K, int, Iterable<Pair<U, V>>, Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.JoinFn
Performs the actual joining.
join(PTable<K, U>, PTable<K, V>, JoinType) - Method in interface org.apache.crunch.lib.join.JoinStrategy
Join two tables with the given join type.
join(K, int, Iterable<Pair<U, V>>, Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.LeftOuterJoinFn
Performs the actual joining.
join(PTable<K, U>, PTable<K, V>, JoinType) - Method in class org.apache.crunch.lib.join.MapsideJoinStrategy
 
join(K, int, Iterable<Pair<U, V>>, Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.RightOuterJoinFn
Performs the actual joining.
join(PTable<K, U>, PTable<K, V>, JoinType) - Method in class org.apache.crunch.lib.join.ShardedJoinStrategy
 
join(PTable<K, U>) - Method in interface org.apache.crunch.PTable
Perform an inner join on this table and the one passed in as an argument on their common keys.
JoinFn<K,U,V> - Class in org.apache.crunch.lib.join
Represents a DoFn for performing joins.
JoinFn(PType<K>, PType<U>) - Constructor for class org.apache.crunch.lib.join.JoinFn
Instantiate with the PType of the value of the left side of the join (used for creating deep copies of values).
JoinStrategy<K,U,V> - Interface in org.apache.crunch.lib.join
Defines a strategy for joining two PTables together on a common key.
JoinType - Enum in org.apache.crunch.lib.join
Specifies the specific behavior of how a join should be performed in terms of requiring matching keys on both sides of the join.
JoinUtils - Class in org.apache.crunch.lib.join
Utilities that are useful in joining multiple data sets via a MapReduce.
JoinUtils() - Constructor for class org.apache.crunch.lib.join.JoinUtils
 
JoinUtils.AvroIndexedRecordPartitioner - Class in org.apache.crunch.lib.join
 
JoinUtils.AvroIndexedRecordPartitioner() - Constructor for class org.apache.crunch.lib.join.JoinUtils.AvroIndexedRecordPartitioner
 
JoinUtils.AvroPairGroupingComparator<T> - Class in org.apache.crunch.lib.join
 
JoinUtils.AvroPairGroupingComparator() - Constructor for class org.apache.crunch.lib.join.JoinUtils.AvroPairGroupingComparator
 
JoinUtils.TupleWritableComparator - Class in org.apache.crunch.lib.join
 
JoinUtils.TupleWritableComparator() - Constructor for class org.apache.crunch.lib.join.JoinUtils.TupleWritableComparator
 
JoinUtils.TupleWritablePartitioner - Class in org.apache.crunch.lib.join
 
JoinUtils.TupleWritablePartitioner() - Constructor for class org.apache.crunch.lib.join.JoinUtils.TupleWritablePartitioner
 
jsons(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
jsons(Class<T>) - Static method in class org.apache.crunch.types.writable.Writables
 
jsonString(Class<T>, PTypeFamily) - Static method in class org.apache.crunch.types.PTypes
Constructs a PType for reading a Java type from a JSON string using Jackson's ObjectMapper.

K

keep(Integer...) - Method in class org.apache.crunch.contrib.text.TokenizerFactory.Builder
Keep only the specified fields found by the input scanner, counting from zero.
keys() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
keys(PTable<K, V>) - Static method in class org.apache.crunch.lib.PTables
Extract the keys from the given PTable<K, V> as a PCollection<K>.
keys() - Method in interface org.apache.crunch.PTable
Returns a PCollection made up of the keys in this PTable.
kill() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
kill() - Method in interface org.apache.crunch.PipelineExecution
Kills the pipeline if it is running, no-op otherwise.

L

LAST_N(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the last n values (or fewer if there are fewer values than n).
leftJoin(PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Join
Performs a left outer join on the specified PTables.
LeftOuterJoinFn<K,U,V> - Class in org.apache.crunch.lib.join
Used to perform the last step of an left outer join.
LeftOuterJoinFn(PType<K>, PType<U>) - Constructor for class org.apache.crunch.lib.join.LeftOuterJoinFn
 
length() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
length(PCollection<S>) - Static method in class org.apache.crunch.lib.Aggregate
Returns the number of elements in the provided PCollection.
length() - Method in interface org.apache.crunch.PCollection
Returns the number of elements represented by this PCollection.
lineParser(String, Class<M>) - Static method in class org.apache.crunch.types.Protos
 
locale(Locale) - Method in class org.apache.crunch.contrib.text.TokenizerFactory.Builder
Sets the Locale to use with the TokenizerFactory returned by this Builder instance.
longs() - Static method in class org.apache.crunch.types.avro.Avros
 
longs() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
longs() - Method in interface org.apache.crunch.types.PTypeFamily
 
longs() - Static method in class org.apache.crunch.types.writable.Writables
 
longs() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 

M

main(String[]) - Static method in class org.apache.crunch.examples.AverageBytesByIP
 
main(String[]) - Static method in class org.apache.crunch.examples.SecondarySortExample
 
main(String[]) - Static method in class org.apache.crunch.examples.SortExample
 
main(String[]) - Static method in class org.apache.crunch.examples.TotalBytesByIP
 
main(String[]) - Static method in class org.apache.crunch.examples.TotalWordCount
 
main(String[]) - Static method in class org.apache.crunch.examples.WordAggregationHBase
 
main(String[]) - Static method in class org.apache.crunch.examples.WordCount
 
makeTuple(Object...) - Method in class org.apache.crunch.types.TupleFactory
 
map(R) - Method in class org.apache.crunch.fn.CompositeMapFn
 
map(V) - Method in class org.apache.crunch.fn.ExtractKeyFn
 
map(T) - Method in class org.apache.crunch.fn.IdentityFn
 
map(Pair<K, V>) - Method in class org.apache.crunch.fn.PairMapFn
 
map(PTable<K1, V1>, Class<? extends Mapper<K1, V1, K2, V2>>, Class<K2>, Class<V2>) - Static method in class org.apache.crunch.lib.Mapred
 
map(PTable<K1, V1>, Class<? extends Mapper<K1, V1, K2, V2>>, Class<K2>, Class<V2>) - Static method in class org.apache.crunch.lib.Mapreduce
 
map(V) - Method in class org.apache.crunch.lib.sort.SortFns.AvroGenericFn
 
map(V) - Method in class org.apache.crunch.lib.sort.SortFns.SingleKeyFn
 
map(V) - Method in class org.apache.crunch.lib.sort.SortFns.TupleKeyFn
 
map(S) - Method in class org.apache.crunch.MapFn
Maps the given input into an instance of the output type.
map(Pair<Object, Iterable<Object>>) - Method in class org.apache.crunch.types.PGroupedTableType.PairIterableMapFn
 
MapDeepCopier<T> - Class in org.apache.crunch.types
 
MapDeepCopier(PType<T>) - Constructor for class org.apache.crunch.types.MapDeepCopier
 
MapFn<S,T> - Class in org.apache.crunch
A DoFn for the common case of emitting exactly one value for each input record.
MapFn() - Constructor for class org.apache.crunch.MapFn
 
MapFunction - Class in org.apache.crunch.impl.spark.fn
 
MapFunction(MapFn, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.MapFunction
 
mapKeys(MapFn<K, K2>, PType<K2>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
mapKeys(String, MapFn<K, K2>, PType<K2>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
mapKeys(PTable<K1, V>, MapFn<K1, K2>, PType<K2>) - Static method in class org.apache.crunch.lib.PTables
Maps a PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on the keys of the PTable.
mapKeys(String, PTable<K1, V>, MapFn<K1, K2>, PType<K2>) - Static method in class org.apache.crunch.lib.PTables
Maps a PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on the keys of the PTable.
mapKeys(MapFn<K, K2>, PType<K2>) - Method in interface org.apache.crunch.PTable
Returns a PTable that has the same values as this instance, but uses the given function to map the keys.
mapKeys(String, MapFn<K, K2>, PType<K2>) - Method in interface org.apache.crunch.PTable
Returns a PTable that has the same values as this instance, but uses the given function to map the keys.
MapOutputFunction<K,V> - Class in org.apache.crunch.impl.spark.fn
 
MapOutputFunction(SerDe, SerDe) - Constructor for class org.apache.crunch.impl.spark.fn.MapOutputFunction
 
Mapred - Class in org.apache.crunch.lib
Static functions for working with legacy Mappers and Reducers that live under the org.apache.hadoop.mapred.* package as part of Crunch pipelines.
Mapred() - Constructor for class org.apache.crunch.lib.Mapred
 
Mapreduce - Class in org.apache.crunch.lib
Static functions for working with legacy Mappers and Reducers that live under the org.apache.hadoop.mapreduce.* package as part of Crunch pipelines.
Mapreduce() - Constructor for class org.apache.crunch.lib.Mapreduce
 
MapReduceTarget - Interface in org.apache.crunch.io
 
maps(PType<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
maps(PType<T>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
maps(PType<T>) - Method in interface org.apache.crunch.types.PTypeFamily
 
maps(PType<T>) - Static method in class org.apache.crunch.types.writable.Writables
 
maps(PType<T>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
MapsideJoinStrategy<K,U,V> - Class in org.apache.crunch.lib.join
Utility for doing map side joins on a common key between two PTables.
MapsideJoinStrategy() - Constructor for class org.apache.crunch.lib.join.MapsideJoinStrategy
Deprecated. Use the MapsideJoinStrategy.create() factory method instead
MapsideJoinStrategy(boolean) - Constructor for class org.apache.crunch.lib.join.MapsideJoinStrategy
Deprecated. Use the MapsideJoinStrategy.create(boolean) factory method instead
mapValues(MapFn<Iterable<V>, U>, PType<U>) - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
mapValues(String, MapFn<Iterable<V>, U>, PType<U>) - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
mapValues(MapFn<V, U>, PType<U>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
mapValues(String, MapFn<V, U>, PType<U>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
mapValues(PTable<K, U>, MapFn<U, V>, PType<V>) - Static method in class org.apache.crunch.lib.PTables
Maps a PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on the values of the PTable.
mapValues(String, PTable<K, U>, MapFn<U, V>, PType<V>) - Static method in class org.apache.crunch.lib.PTables
Maps a PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on the values of the PTable.
mapValues(PGroupedTable<K, U>, MapFn<Iterable<U>, V>, PType<V>) - Static method in class org.apache.crunch.lib.PTables
An analogue of the mapValues function for PGroupedTable<K, U> collections.
mapValues(String, PGroupedTable<K, U>, MapFn<Iterable<U>, V>, PType<V>) - Static method in class org.apache.crunch.lib.PTables
An analogue of the mapValues function for PGroupedTable<K, U> collections.
mapValues(MapFn<Iterable<V>, U>, PType<U>) - Method in interface org.apache.crunch.PGroupedTable
Maps the Iterable<V> elements of each record to a new type.
mapValues(String, MapFn<Iterable<V>, U>, PType<U>) - Method in interface org.apache.crunch.PGroupedTable
Maps the Iterable<V> elements of each record to a new type.
mapValues(MapFn<V, U>, PType<U>) - Method in interface org.apache.crunch.PTable
Returns a PTable that has the same keys as this instance, but uses the given function to map the values.
mapValues(String, MapFn<V, U>, PType<U>) - Method in interface org.apache.crunch.PTable
Returns a PTable that has the same keys as this instance, but uses the given function to map the values.
markLogged() - Method in exception org.apache.crunch.CrunchRuntimeException
Indicate that this exception has been written to the debug logs.
materialize() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
materialize(PCollection<T>) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
materialize(PCollection<T>) - Method in class org.apache.crunch.impl.mr.MRPipeline
 
materialize(PCollection<T>) - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
materialize() - Method in interface org.apache.crunch.PCollection
Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.
materialize(PCollection<T>) - Method in interface org.apache.crunch.Pipeline
Create the given PCollection and read the data it contains into the returned Collection instance for client use.
materialize(PCollection<T>) - Method in class org.apache.crunch.util.CrunchTool
 
materializeAt(SourceTarget<S>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
materializeToMap() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
Returns a Map made up of the keys and values in this PTable.
materializeToMap() - Method in interface org.apache.crunch.PTable
Returns a Map made up of the keys and values in this PTable.
max() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
max(PCollection<S>) - Static method in class org.apache.crunch.lib.Aggregate
Returns the largest numerical element from the input collection.
max() - Method in interface org.apache.crunch.PCollection
Returns a PObject of the maximum element of this instance.
MAX_BIGINTS() - Static method in class org.apache.crunch.fn.Aggregators
Return the maximum of all given BigInteger values.
MAX_BIGINTS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n largest BigInteger values (or fewer if there are fewer values than n).
MAX_DOUBLES() - Static method in class org.apache.crunch.fn.Aggregators
Return the maximum of all given double values.
MAX_DOUBLES(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n largest double values (or fewer if there are fewer values than n).
MAX_FLOATS() - Static method in class org.apache.crunch.fn.Aggregators
Return the maximum of all given float values.
MAX_FLOATS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n largest float values (or fewer if there are fewer values than n).
MAX_INTS() - Static method in class org.apache.crunch.fn.Aggregators
Return the maximum of all given int values.
MAX_INTS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n largest int values (or fewer if there are fewer values than n).
MAX_LONGS() - Static method in class org.apache.crunch.fn.Aggregators
Return the maximum of all given long values.
MAX_LONGS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n largest long values (or fewer if there are fewer values than n).
MAX_N(int, Class<V>) - Static method in class org.apache.crunch.fn.Aggregators
Return the n largest values (or fewer if there are fewer values than n).
MAX_REDUCERS - Static variable in class org.apache.crunch.util.PartitionUtils
Set an upper limit on the number of reducers the Crunch planner will set for an MR job when it tries to determine how many reducers to use based on the input size.
MemPipeline - Class in org.apache.crunch.impl.mem
 
min() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
min(PCollection<S>) - Static method in class org.apache.crunch.lib.Aggregate
Returns the smallest numerical element from the input collection.
min() - Method in interface org.apache.crunch.PCollection
Returns a PObject of the minimum element of this instance.
MIN_BIGINTS() - Static method in class org.apache.crunch.fn.Aggregators
Return the minimum of all given BigInteger values.
MIN_BIGINTS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n smallest BigInteger values (or fewer if there are fewer values than n).
MIN_DOUBLES() - Static method in class org.apache.crunch.fn.Aggregators
Return the minimum of all given double values.
MIN_DOUBLES(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n smallest double values (or fewer if there are fewer values than n).
MIN_FLOATS() - Static method in class org.apache.crunch.fn.Aggregators
Return the minimum of all given float values.
MIN_FLOATS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n smallest float values (or fewer if there are fewer values than n).
MIN_INTS() - Static method in class org.apache.crunch.fn.Aggregators
Return the minimum of all given int values.
MIN_INTS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n smallest int values (or fewer if there are fewer values than n).
MIN_LONGS() - Static method in class org.apache.crunch.fn.Aggregators
Return the minimum of all given long values.
MIN_LONGS(int) - Static method in class org.apache.crunch.fn.Aggregators
Return the n smallest long values (or fewer if there are fewer values than n).
MIN_N(int, Class<V>) - Static method in class org.apache.crunch.fn.Aggregators
Return the n smallest values (or fewer if there are fewer values than n).
MRCollection - Interface in org.apache.crunch.impl.dist.collect
 
MRJob - Interface in org.apache.crunch.impl.mr
A Hadoop MapReduce job managed by Crunch.
MRJob.State - Enum in org.apache.crunch.impl.mr
A job will be in one of the following states.
MRPipeline - Class in org.apache.crunch.impl.mr
Pipeline implementation that is executed within Hadoop MapReduce.
MRPipeline(Class<?>) - Constructor for class org.apache.crunch.impl.mr.MRPipeline
Instantiate with a default Configuration and name.
MRPipeline(Class<?>, String) - Constructor for class org.apache.crunch.impl.mr.MRPipeline
Instantiate with a custom pipeline name.
MRPipeline(Class<?>, Configuration) - Constructor for class org.apache.crunch.impl.mr.MRPipeline
Instantiate with a custom configuration and default naming.
MRPipeline(Class<?>, String, Configuration) - Constructor for class org.apache.crunch.impl.mr.MRPipeline
Instantiate with a custom name and configuration.
MRPipelineExecution - Interface in org.apache.crunch.impl.mr
 

N

name - Variable in class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
newReader(Schema) - Static method in class org.apache.crunch.types.avro.Avros
 
newReader(AvroType<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
newWriter(Schema) - Static method in class org.apache.crunch.types.avro.Avros
 
newWriter(AvroType<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
next() - Method in class org.apache.crunch.contrib.text.Tokenizer
Advance this Tokenizer and return the next String from the Scanner.
next() - Method in class org.apache.crunch.util.DoFnIterator
 
nextBoolean() - Method in class org.apache.crunch.contrib.text.Tokenizer
Advance this Tokenizer and return the next Boolean from the Scanner.
nextDouble() - Method in class org.apache.crunch.contrib.text.Tokenizer
Advance this Tokenizer and return the next Double from the Scanner.
nextFloat() - Method in class org.apache.crunch.contrib.text.Tokenizer
Advance this Tokenizer and return the next Float from the Scanner.
nextInt() - Method in class org.apache.crunch.contrib.text.Tokenizer
Advance this Tokenizer and return the next Integer from the Scanner.
nextLong() - Method in class org.apache.crunch.contrib.text.Tokenizer
Advance this Tokenizer and return the next Long from the Scanner.
NoOpDeepCopier<T> - Class in org.apache.crunch.types
A DeepCopier that does nothing, and just returns the input value without copying anything.
not(FilterFn<S>) - Static method in class org.apache.crunch.fn.FilterFns
Accept an entry if the given filter does not accept it.
nulls() - Static method in class org.apache.crunch.types.avro.Avros
 
nulls() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
nulls() - Method in interface org.apache.crunch.types.PTypeFamily
 
nulls() - Static method in class org.apache.crunch.types.writable.Writables
 
nulls() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
numPartitions() - Method in class org.apache.crunch.impl.spark.SparkPartitioner
 
numReducers(int) - Method in class org.apache.crunch.GroupingOptions.Builder
 

O

of(T, U) - Static method in class org.apache.crunch.Pair
 
of(A, B, C) - Static method in class org.apache.crunch.Tuple3
 
of(A, B, C, D) - Static method in class org.apache.crunch.Tuple4
 
of(Object...) - Static method in class org.apache.crunch.TupleN
 
OneToManyJoin - Class in org.apache.crunch.lib.join
Optimized join for situations where exactly one value is being joined with any other number of values based on a common key.
OneToManyJoin() - Constructor for class org.apache.crunch.lib.join.OneToManyJoin
 
oneToManyJoin(PTable<K, U>, PTable<K, V>, DoFn<Pair<U, Iterable<V>>, T>, PType<T>) - Static method in class org.apache.crunch.lib.join.OneToManyJoin
Performs a join on two tables, where the left table only contains a single value per key.
oneToManyJoin(PTable<K, U>, PTable<K, V>, DoFn<Pair<U, Iterable<V>>, T>, PType<T>, int) - Static method in class org.apache.crunch.lib.join.OneToManyJoin
Supports a user-specified number of reducers for the one-to-many join.
or(FilterFn<S>, FilterFn<S>) - Static method in class org.apache.crunch.fn.FilterFns
Accept an entry if at least one of the given filters accept it, using short-circuit evaluation.
or(FilterFn<S>...) - Static method in class org.apache.crunch.fn.FilterFns
Accept an entry if at least one of the given filters accept it, using short-circuit evaluation.
order() - Method in class org.apache.crunch.lib.Sort.ColumnOrder
 
org.apache.crunch - package org.apache.crunch
Client-facing API and core abstractions.
org.apache.crunch.contrib - package org.apache.crunch.contrib
User contributions that may be interesting for special applications.
org.apache.crunch.contrib.bloomfilter - package org.apache.crunch.contrib.bloomfilter
Support for creating Bloom Filters.
org.apache.crunch.contrib.io.jdbc - package org.apache.crunch.contrib.io.jdbc
Support for reading data from RDBMS using JDBC
org.apache.crunch.contrib.text - package org.apache.crunch.contrib.text
 
org.apache.crunch.examples - package org.apache.crunch.examples
Example applications demonstrating various aspects of Crunch.
org.apache.crunch.fn - package org.apache.crunch.fn
Commonly used functions for manipulating collections.
org.apache.crunch.impl - package org.apache.crunch.impl
 
org.apache.crunch.impl.dist - package org.apache.crunch.impl.dist
 
org.apache.crunch.impl.dist.collect - package org.apache.crunch.impl.dist.collect
 
org.apache.crunch.impl.mem - package org.apache.crunch.impl.mem
In-memory Pipeline implementation for rapid prototyping and testing.
org.apache.crunch.impl.mr - package org.apache.crunch.impl.mr
A Pipeline implementation that runs on Hadoop MapReduce.
org.apache.crunch.impl.spark - package org.apache.crunch.impl.spark
 
org.apache.crunch.impl.spark.collect - package org.apache.crunch.impl.spark.collect
 
org.apache.crunch.impl.spark.fn - package org.apache.crunch.impl.spark.fn
 
org.apache.crunch.impl.spark.serde - package org.apache.crunch.impl.spark.serde
 
org.apache.crunch.io - package org.apache.crunch.io
Data input and output for Pipelines.
org.apache.crunch.lib - package org.apache.crunch.lib
Joining, sorting, aggregating, and other commonly used functionality.
org.apache.crunch.lib.join - package org.apache.crunch.lib.join
Inner and outer joins on collections.
org.apache.crunch.lib.sort - package org.apache.crunch.lib.sort
 
org.apache.crunch.test - package org.apache.crunch.test
Utilities for testing Crunch-based applications.
org.apache.crunch.types - package org.apache.crunch.types
Common functionality for business object serialization.
org.apache.crunch.types.avro - package org.apache.crunch.types.avro
Business object serialization using Apache Avro.
org.apache.crunch.types.writable - package org.apache.crunch.types.writable
Business object serialization using Hadoop's Writables framework.
org.apache.crunch.util - package org.apache.crunch.util
An assorted set of utilities.
org.apache.hadoop.mapred - package org.apache.hadoop.mapred
 
outputConf(String, String) - Method in interface org.apache.crunch.Target
Adds the given key-value pair to the Configuration instance that is used to write this Target.
OutputConverterFunction<K,V,S> - Class in org.apache.crunch.impl.spark.fn
 
OutputConverterFunction(Converter<K, V, S, ?>) - Constructor for class org.apache.crunch.impl.spark.fn.OutputConverterFunction
 
OutputHandler - Interface in org.apache.crunch.io
 
outputKey(S) - Method in interface org.apache.crunch.types.Converter
 
outputValue(S) - Method in interface org.apache.crunch.types.Converter
 
override(ReaderWriterFactory) - Method in class org.apache.crunch.types.avro.AvroMode
Deprecated. use AvroMode.withFactory(ReaderWriterFactory) instead.
overridePathProperties(Configuration) - Method in class org.apache.crunch.test.TemporaryPath
Set all keys specified in the constructor to temporary directories.

P

Pair<K,V> - Class in org.apache.crunch
A convenience class for two-element Tuples.
Pair(K, V) - Constructor for class org.apache.crunch.Pair
 
PAIR - Static variable in class org.apache.crunch.types.TupleFactory
 
pair2tupleFunc() - Static method in class org.apache.crunch.impl.spark.GuavaUtils
 
pairAggregator(Aggregator<V1>, Aggregator<V2>) - Static method in class org.apache.crunch.fn.Aggregators
Apply separate aggregators to each component of a Pair.
PairFlatMapDoFn<T,K,V> - Class in org.apache.crunch.impl.spark.fn
 
PairFlatMapDoFn(DoFn<T, Pair<K, V>>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.PairFlatMapDoFn
 
PairFlatMapPairDoFn<K,V,K2,V2> - Class in org.apache.crunch.impl.spark.fn
 
PairFlatMapPairDoFn(DoFn<Pair<K, V>, Pair<K2, V2>>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.PairFlatMapPairDoFn
 
PairMapFn<K,V,S,T> - Class in org.apache.crunch.fn
 
PairMapFn(MapFn<K, S>, MapFn<V, T>) - Constructor for class org.apache.crunch.fn.PairMapFn
 
PairMapFunction<K,V,S> - Class in org.apache.crunch.impl.spark.fn
 
PairMapFunction(MapFn<Pair<K, V>, S>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.PairMapFunction
 
PairMapIterableFunction<K,V,S,T> - Class in org.apache.crunch.impl.spark.fn
 
PairMapIterableFunction(MapFn<Pair<K, List<V>>, Pair<S, Iterable<T>>>, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.PairMapIterableFunction
 
pairs(PType<V1>, PType<V2>) - Static method in class org.apache.crunch.types.avro.Avros
 
pairs(PType<V1>, PType<V2>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
pairs(PType<V1>, PType<V2>) - Method in interface org.apache.crunch.types.PTypeFamily
 
pairs(PType<V1>, PType<V2>) - Static method in class org.apache.crunch.types.writable.Writables
 
pairs(PType<V1>, PType<V2>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
parallelDo(DoFn<S, T>, PType<T>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
parallelDo(String, DoFn<S, T>, PType<T>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
parallelDo(String, DoFn<S, T>, PType<T>, ParallelDoOptions) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
parallelDo(DoFn<S, Pair<K, V>>, PTableType<K, V>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
parallelDo(String, DoFn<S, Pair<K, V>>, PTableType<K, V>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
parallelDo(String, DoFn<S, Pair<K, V>>, PTableType<K, V>, ParallelDoOptions) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
parallelDo(DoFn<S, T>, PType<T>) - Method in interface org.apache.crunch.PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
parallelDo(String, DoFn<S, T>, PType<T>) - Method in interface org.apache.crunch.PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
parallelDo(String, DoFn<S, T>, PType<T>, ParallelDoOptions) - Method in interface org.apache.crunch.PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
parallelDo(DoFn<S, Pair<K, V>>, PTableType<K, V>) - Method in interface org.apache.crunch.PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
parallelDo(String, DoFn<S, Pair<K, V>>, PTableType<K, V>) - Method in interface org.apache.crunch.PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
parallelDo(String, DoFn<S, Pair<K, V>>, PTableType<K, V>, ParallelDoOptions) - Method in interface org.apache.crunch.PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
ParallelDoOptions - Class in org.apache.crunch
Container class that includes optional information about a parallelDo operation applied to a PCollection.
ParallelDoOptions.Builder - Class in org.apache.crunch
 
ParallelDoOptions.Builder() - Constructor for class org.apache.crunch.ParallelDoOptions.Builder
 
Parse - Class in org.apache.crunch.contrib.text
Methods for parsing instances of PCollection<String> into PCollection's of strongly-typed tuples.
parse(String, PCollection<String>, Extractor<T>) - Static method in class org.apache.crunch.contrib.text.Parse
Parses the lines of the input PCollection<String> and returns a PCollection<T> using the given Extractor<T>.
parse(String, PCollection<String>, PTypeFamily, Extractor<T>) - Static method in class org.apache.crunch.contrib.text.Parse
Parses the lines of the input PCollection<String> and returns a PCollection<T> using the given Extractor<T> that uses the given PTypeFamily.
parseTable(String, PCollection<String>, Extractor<Pair<K, V>>) - Static method in class org.apache.crunch.contrib.text.Parse
Parses the lines of the input PCollection<String> and returns a PTable<K, V> using the given Extractor<Pair<K, V>>.
parseTable(String, PCollection<String>, PTypeFamily, Extractor<Pair<K, V>>) - Static method in class org.apache.crunch.contrib.text.Parse
Parses the lines of the input PCollection<String> and returns a PTable<K, V> using the given Extractor<Pair<K, V>> that uses the given PTypeFamily.
partition - Variable in class org.apache.crunch.impl.spark.IntByteArray
 
PartitionedMapOutputFunction<K,V> - Class in org.apache.crunch.impl.spark.fn
 
PartitionedMapOutputFunction(SerDe<K>, SerDe<V>, PGroupedTableType<K, V>, Class<? extends Partitioner>, int, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.PartitionedMapOutputFunction
 
PARTITIONER_PATH - Static variable in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
partitionerClass(Class<? extends Partitioner>) - Method in class org.apache.crunch.GroupingOptions.Builder
 
PartitionUtils - Class in org.apache.crunch.util
Helper functions and settings for determining the number of reducers to use in a pipeline job created by the Crunch planner.
PartitionUtils() - Constructor for class org.apache.crunch.util.PartitionUtils
 
PathTarget - Interface in org.apache.crunch.io
A target whose output goes to a given path on a file system.
PCollection<S> - Interface in org.apache.crunch
A representation of an immutable, distributed collection of elements that is the fundamental target of computations in Crunch.
PCollectionFactory - Interface in org.apache.crunch.impl.dist.collect
 
PCollectionImpl<S> - Class in org.apache.crunch.impl.dist.collect
 
PCollectionImpl(String, DistributedPipeline) - Constructor for class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
PCollectionImpl(String, DistributedPipeline, ParallelDoOptions) - Constructor for class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
PCollectionImpl.Visitor - Interface in org.apache.crunch.impl.dist.collect
 
PGroupedTable<K,V> - Interface in org.apache.crunch
The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.
PGroupedTableImpl<K,V> - Class in org.apache.crunch.impl.spark.collect
 
PGroupedTableType<K,V> - Class in org.apache.crunch.types
The PType instance for PGroupedTable instances.
PGroupedTableType(PTableType<K, V>) - Constructor for class org.apache.crunch.types.PGroupedTableType
 
PGroupedTableType.PairIterableMapFn<K,V> - Class in org.apache.crunch.types
 
PGroupedTableType.PairIterableMapFn(MapFn<Object, K>, MapFn<Object, V>) - Constructor for class org.apache.crunch.types.PGroupedTableType.PairIterableMapFn
 
Pipeline - Interface in org.apache.crunch
Manages the state of a pipeline execution.
PipelineExecution - Interface in org.apache.crunch
A handle to allow clients to control a Crunch pipeline as it runs.
PipelineExecution.Status - Enum in org.apache.crunch
 
PipelineResult - Class in org.apache.crunch
Container for the results of a call to run or done on the Pipeline interface that includes details and statistics about the component stages of the data pipeline.
PipelineResult(List<PipelineResult.StageResult>, PipelineExecution.Status) - Constructor for class org.apache.crunch.PipelineResult
 
PipelineResult.StageResult - Class in org.apache.crunch
 
PipelineResult.StageResult(String, Counters) - Constructor for class org.apache.crunch.PipelineResult.StageResult
 
PipelineResult.StageResult(String, Counters, long, long) - Constructor for class org.apache.crunch.PipelineResult.StageResult
 
PipelineResult.StageResult(String, String, Counters, long, long, long, long) - Constructor for class org.apache.crunch.PipelineResult.StageResult
 
plan() - Method in class org.apache.crunch.impl.mr.MRPipeline
 
PObject<T> - Interface in org.apache.crunch
A PObject represents a singleton object value that results from a distributed computation.
process(S, Emitter<Pair<String, BloomFilter>>) - Method in class org.apache.crunch.contrib.bloomfilter.BloomFilterFn
 
process(S, Emitter<T>) - Method in class org.apache.crunch.DoFn
Processes the records from a PCollection.
process(T, Emitter<T>) - Method in class org.apache.crunch.FilterFn
 
process(Pair<Integer, Iterable<Pair<K, V>>>, Emitter<Pair<Integer, Pair<K, V>>>) - Method in class org.apache.crunch.lib.Aggregate.TopKCombineFn
 
process(Pair<K, V>, Emitter<Pair<Integer, Pair<K, V>>>) - Method in class org.apache.crunch.lib.Aggregate.TopKFn
 
process(Pair<Pair<K, Integer>, Iterable<Pair<U, V>>>, Emitter<Pair<K, Pair<U, V>>>) - Method in class org.apache.crunch.lib.join.JoinFn
Split up the input record to make coding a bit more manageable.
process(S, Emitter<T>) - Method in class org.apache.crunch.MapFn
 
Protos - Class in org.apache.crunch.types
Utility functions for working with protocol buffers in Crunch.
Protos() - Constructor for class org.apache.crunch.types.Protos
 
protos(Class<T>, PTypeFamily) - Static method in class org.apache.crunch.types.PTypes
Constructs a PType for the given protocol buffer.
protos(Class<T>, PTypeFamily, SerializableSupplier<ExtensionRegistry>) - Static method in class org.apache.crunch.types.PTypes
Constructs a PType for a protocol buffer, using the given SerializableSupplier to provide an ExtensionRegistry to use in reading the given protobuf.
PTable<K,V> - Interface in org.apache.crunch
A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
PTableBase<K,V> - Class in org.apache.crunch.impl.dist.collect
 
PTableBase(String, DistributedPipeline) - Constructor for class org.apache.crunch.impl.dist.collect.PTableBase
 
PTableBase(String, DistributedPipeline, ParallelDoOptions) - Constructor for class org.apache.crunch.impl.dist.collect.PTableBase
 
PTables - Class in org.apache.crunch.lib
Methods for performing common operations on PTables.
PTables() - Constructor for class org.apache.crunch.lib.PTables
 
PTableType<K,V> - Interface in org.apache.crunch.types
An extension of PType specifically for PTable objects.
PType<T> - Interface in org.apache.crunch.types
A PType defines a mapping between a data type that is used in a Crunch pipeline and a serialization and storage format that is used to read/write data from/to HDFS.
PTypeFamily - Interface in org.apache.crunch.types
An abstract factory for creating PType instances that have the same serialization/storage backing format.
PTypes - Class in org.apache.crunch.types
Utility functions for creating common types of derived PTypes, e.g., for JSON data, protocol buffers, and Thrift records.
PTypes() - Constructor for class org.apache.crunch.types.PTypes
 
PTypeUtils - Class in org.apache.crunch.types
Utilities for converting between PTypes from different PTypeFamily implementations.

Q

quadAggregator(Aggregator<V1>, Aggregator<V2>, Aggregator<V3>, Aggregator<V4>) - Static method in class org.apache.crunch.fn.Aggregators
Apply separate aggregators to each component of a Tuple4.
quads(PType<V1>, PType<V2>, PType<V3>, PType<V4>) - Static method in class org.apache.crunch.types.avro.Avros
 
quads(PType<V1>, PType<V2>, PType<V3>, PType<V4>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
quads(PType<V1>, PType<V2>, PType<V3>, PType<V4>) - Method in interface org.apache.crunch.types.PTypeFamily
 
quads(PType<V1>, PType<V2>, PType<V3>, PType<V4>) - Static method in class org.apache.crunch.types.writable.Writables
 
quads(PType<V1>, PType<V2>, PType<V3>, PType<V4>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 

R

read(Source<S>) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
read(TableSource<K, V>) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
read(Source<T>) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
read(TableSource<K, V>) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
read(FileSystem, Path) - Method in interface org.apache.crunch.io.FileReaderFactory
 
read(Configuration) - Method in interface org.apache.crunch.io.ReadableSource
Returns an Iterable that contains the contents of this source.
read(Source<T>) - Method in interface org.apache.crunch.Pipeline
Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.
read(TableSource<K, V>) - Method in interface org.apache.crunch.Pipeline
A version of the read method for TableSource instances that map to PTables.
read(TaskInputOutputContext<?, ?, ?, ?>) - Method in interface org.apache.crunch.ReadableData
Read the data referenced by this instance within the given context.
read(Source<T>) - Method in class org.apache.crunch.util.CrunchTool
 
read(TableSource<K, V>) - Method in class org.apache.crunch.util.CrunchTool
 
read(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.util.DelegatingReadableData
 
read(Configuration, Path) - Static method in class org.apache.crunch.util.DistCache
 
read(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.util.UnionReadableData
 
ReadableData<T> - Interface in org.apache.crunch
Represents the contents of a data source that can be read on the cluster from within one of the tasks running as part of a Crunch pipeline.
ReadableSource<T> - Interface in org.apache.crunch.io
An extension of the Source interface that indicates that a Source instance may be read as a series of records by the client code.
ReadableSourceTarget<T> - Interface in org.apache.crunch.io
An interface that indicates that a SourceTarget instance can be read into the local client.
ReaderWriterFactory - Interface in org.apache.crunch.types.avro
Interface for accessing DatumReader, DatumWriter, and Data classes.
readFields(DataInput) - Method in class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
readFields(ResultSet) - Method in class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
readFields(DataInput) - Method in class org.apache.crunch.io.FormatBundle
 
readFields(DataInput) - Method in class org.apache.crunch.types.writable.TupleWritable
readFields(DataInput) - Method in class org.apache.crunch.types.writable.UnionWritable
 
readTextFile(String) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
readTextFile(String) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
readTextFile(String) - Method in interface org.apache.crunch.Pipeline
A convenience method for reading a text file.
readTextFile(String) - Method in class org.apache.crunch.util.CrunchTool
 
records(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
records(Class<T>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
records(Class<T>) - Method in interface org.apache.crunch.types.PTypeFamily
 
records(Class<T>) - Static method in class org.apache.crunch.types.writable.Writables
 
records(Class<T>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
reduce(PGroupedTable<K1, V1>, Class<? extends Reducer<K1, V1, K2, V2>>, Class<K2>, Class<V2>) - Static method in class org.apache.crunch.lib.Mapred
 
reduce(PGroupedTable<K1, V1>, Class<? extends Reducer<K1, V1, K2, V2>>, Class<K2>, Class<V2>) - Static method in class org.apache.crunch.lib.Mapreduce
 
ReduceGroupingFunction - Class in org.apache.crunch.impl.spark.fn
 
ReduceGroupingFunction(GroupingOptions, PGroupedTableType, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.fn.ReduceGroupingFunction
 
ReduceInputFunction<K,V> - Class in org.apache.crunch.impl.spark.fn
 
ReduceInputFunction(SerDe<K>, SerDe<V>) - Constructor for class org.apache.crunch.impl.spark.fn.ReduceInputFunction
 
REFLECT - Static variable in class org.apache.crunch.types.avro.AvroMode
Default mode to use for reading and writing Reflect types.
REFLECT_DATA_FACTORY - Static variable in class org.apache.crunch.types.avro.Avros
Deprecated. as of 0.9.0; use AvroMode.REFLECT.override(ReaderWriterFactory)
REFLECT_DATA_FACTORY_CLASS - Static variable in class org.apache.crunch.types.avro.Avros
The name of the configuration parameter that tracks which reflection factory to use.
ReflectDataFactory - Class in org.apache.crunch.types.avro
A Factory class for constructing Avro reflection-related objects.
ReflectDataFactory() - Constructor for class org.apache.crunch.types.avro.ReflectDataFactory
 
reflects(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
reflects(Class<T>, Schema) - Static method in class org.apache.crunch.types.avro.Avros
 
register(Class<T>, AvroType<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
register(Class<T>, WritableType<T, ? extends Writable>) - Static method in class org.apache.crunch.types.writable.Writables
 
registerComparable(Class<? extends WritableComparable>) - Static method in class org.apache.crunch.types.writable.Writables
Registers a WritableComparable class so that it can be used for comparing the fields inside of tuple types (e.g., pairs, trips, tupleN, etc.) for use in sorts and secondary sorts.
registerComparable(Class<? extends WritableComparable>, int) - Static method in class org.apache.crunch.types.writable.Writables
Registers a WritableComparable class with a given integer code to use for serializing and deserializing instances of this class that are defined inside of tuple types (e.g., pairs, trips, tupleN, etc.) Unregistered Writables are always serialized to bytes and cannot be used in comparisons (e.g., sorts and secondary sorts) according to their underlying types.
REJECT_ALL() - Static method in class org.apache.crunch.fn.FilterFns
Reject everything.
remove() - Method in class org.apache.crunch.util.DoFnIterator
 
replicas(int) - Method in class org.apache.crunch.CachingOptions.Builder
 
replicas() - Method in class org.apache.crunch.CachingOptions
Returns the number of replicas of the data that should be maintained in the cache.
requireSortedKeys() - Method in class org.apache.crunch.GroupingOptions.Builder
 
requireSortedKeys() - Method in class org.apache.crunch.GroupingOptions
 
reservoirSample(PCollection<T>, int) - Static method in class org.apache.crunch.lib.Sample
Select a fixed number of elements from the given PCollection with each element equally likely to be included in the sample.
reservoirSample(PCollection<T>, int, Long) - Static method in class org.apache.crunch.lib.Sample
A version of the reservoir sampling algorithm that uses a given seed, primarily for testing purposes.
reset() - Method in interface org.apache.crunch.Aggregator
Clears the internal state of this Aggregator and prepares it for the values associated with the next key.
results() - Method in interface org.apache.crunch.Aggregator
Returns the current aggregated state of this instance.
ReverseAvroComparator<T> - Class in org.apache.crunch.lib.sort
 
ReverseAvroComparator() - Constructor for class org.apache.crunch.lib.sort.ReverseAvroComparator
 
ReverseWritableComparator<T> - Class in org.apache.crunch.lib.sort
 
ReverseWritableComparator() - Constructor for class org.apache.crunch.lib.sort.ReverseWritableComparator
 
rightJoin(PTable<K, U>, PTable<K, V>) - Static method in class org.apache.crunch.lib.Join
Performs a right outer join on the specified PTables.
RightOuterJoinFn<K,U,V> - Class in org.apache.crunch.lib.join
Used to perform the last step of an right outer join.
RightOuterJoinFn(PType<K>, PType<U>) - Constructor for class org.apache.crunch.lib.join.RightOuterJoinFn
 
run(String[]) - Method in class org.apache.crunch.examples.AverageBytesByIP
 
run(String[]) - Method in class org.apache.crunch.examples.SecondarySortExample
 
run(String[]) - Method in class org.apache.crunch.examples.SortExample
 
run(String[]) - Method in class org.apache.crunch.examples.TotalBytesByIP
 
run(String[]) - Method in class org.apache.crunch.examples.TotalWordCount
 
run(String[]) - Method in class org.apache.crunch.examples.WordAggregationHBase
 
run(String[]) - Method in class org.apache.crunch.examples.WordCount
 
run() - Method in class org.apache.crunch.impl.mem.MemPipeline
 
run() - Method in class org.apache.crunch.impl.mr.MRPipeline
 
run() - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
run() - Method in interface org.apache.crunch.Pipeline
Constructs and executes a series of MapReduce jobs in order to write data to the output targets.
run() - Method in class org.apache.crunch.util.CrunchTool
 
runAsync() - Method in class org.apache.crunch.impl.mem.MemPipeline
 
runAsync() - Method in class org.apache.crunch.impl.mr.MRPipeline
 
runAsync() - Method in class org.apache.crunch.impl.spark.SparkPipeline
 
runAsync() - Method in interface org.apache.crunch.Pipeline
Constructs and starts a series of MapReduce jobs in order ot write data to the output targets, but returns a ListenableFuture to allow clients to control job execution.
runAsync() - Method in class org.apache.crunch.util.CrunchTool
 

S

Sample - Class in org.apache.crunch.lib
Methods for performing random sampling in a distributed fashion, either by accepting each record in a PCollection with an independent probability in order to sample some fraction of the overall data set, or by using reservoir sampling in order to pull a uniform or weighted sample of fixed size from a PCollection of an unknown size.
Sample() - Constructor for class org.apache.crunch.lib.Sample
 
sample(PCollection<S>, double) - Static method in class org.apache.crunch.lib.Sample
Output records from the given PCollection with the given probability.
sample(PCollection<S>, Long, double) - Static method in class org.apache.crunch.lib.Sample
Output records from the given PCollection using a given seed.
sample(PTable<K, V>, double) - Static method in class org.apache.crunch.lib.Sample
A PTable<K, V> analogue of the sample function.
sample(PTable<K, V>, Long, double) - Static method in class org.apache.crunch.lib.Sample
A PTable<K, V> analogue of the sample function, with the seed argument exposed for testing purposes.
SAMPLE_UNIQUE_ELEMENTS(int) - Static method in class org.apache.crunch.fn.Aggregators
Collect a sample of unique elements from the input, where 'unique' is defined by the equals method for the input objects.
scaleFactor() - Method in class org.apache.crunch.DoFn
Returns an estimate of how applying this function to a PCollection will cause it to change in side.
scaleFactor() - Method in class org.apache.crunch.FilterFn
 
scaleFactor() - Method in class org.apache.crunch.MapFn
 
second() - Method in class org.apache.crunch.Pair
 
second() - Method in class org.apache.crunch.Tuple3
 
second() - Method in class org.apache.crunch.Tuple4
 
SecondarySort - Class in org.apache.crunch.lib
Utilities for performing a secondary sort on a PTable<K, Pair<V1, V2>> collection.
SecondarySort() - Constructor for class org.apache.crunch.lib.SecondarySort
 
SecondarySortExample - Class in org.apache.crunch.examples
 
SecondarySortExample() - Constructor for class org.apache.crunch.examples.SecondarySortExample
 
sequenceFile(String, Class<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(Path, Class<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(String, PType<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(Path, PType<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(String, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.At
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).
sequenceFile(Path, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.At
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).
sequenceFile(String, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.At
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).
sequenceFile(Path, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.At
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).
sequenceFile(String, Class<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(Path, Class<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(List<Path>, Class<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the SequenceFile(s) at the given Paths from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(String, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(Path, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(List<Path>, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance from the SequenceFile(s) at the given Paths from the value field of each key-value pair in the SequenceFile(s).
sequenceFile(String, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.
sequenceFile(Path, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.
sequenceFile(List<Path>, Class<K>, Class<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
sequenceFile(String, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.
sequenceFile(Path, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.
sequenceFile(List<Path>, PType<K>, PType<V>) - Static method in class org.apache.crunch.io.From
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
sequenceFile(String) - Static method in class org.apache.crunch.io.To
Creates a Target at the given path name that writes data to SequenceFiles.
sequenceFile(Path) - Static method in class org.apache.crunch.io.To
Creates a Target at the given Path that writes data to SequenceFiles.
SequentialFileNamingScheme - Class in org.apache.crunch.io
Default FileNamingScheme that uses an incrementing sequence number in order to generate unique file names.
SerDe<T> - Interface in org.apache.crunch.impl.spark.serde
 
SerializableSupplier<T> - Interface in org.apache.crunch.util
An extension of Guava's Supplier interface that indicates that an instance will also implement Serializable, which makes this object suitable for use with Crunch's DoFns when we need to construct an instance of a non-serializable type for use in processing.
serialize() - Method in class org.apache.crunch.io.FormatBundle
 
set(String, String) - Method in class org.apache.crunch.io.FormatBundle
 
Set - Class in org.apache.crunch.lib
Utilities for performing set operations (difference, intersection, etc) on PCollection instances.
Set() - Constructor for class org.apache.crunch.lib.Set
 
set(int, Writable) - Method in class org.apache.crunch.types.writable.TupleWritable
 
setBreakpoint() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionCollection
 
setBreakpoint() - Method in class org.apache.crunch.impl.dist.collect.BaseUnionTable
 
setBreakpoint() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
setCombineFn(CombineFn) - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
setConf(Broadcast<byte[]>) - Method in class org.apache.crunch.impl.spark.SparkRuntimeContext
 
setConf(Configuration) - Method in class org.apache.crunch.io.FormatBundle
 
setConf(Configuration) - Method in class org.apache.crunch.lib.join.JoinUtils.AvroPairGroupingComparator
 
setConf(Configuration) - Method in class org.apache.crunch.lib.sort.ReverseAvroComparator
 
setConf(Configuration) - Method in class org.apache.crunch.lib.sort.ReverseWritableComparator
 
setConf(Configuration) - Method in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
setConf(Configuration) - Method in class org.apache.crunch.lib.sort.TupleWritableComparator
 
setConf(Configuration) - Method in class org.apache.crunch.types.writable.TupleWritable
 
setConf(Configuration) - Method in class org.apache.crunch.util.CrunchTool
 
setConfiguration(Configuration) - Method in class org.apache.crunch.DoFn
Called during the setup of an initialized PType that relies on this instance.
setConfiguration(Configuration) - Method in class org.apache.crunch.fn.CompositeMapFn
 
setConfiguration(Configuration) - Method in class org.apache.crunch.fn.ExtractKeyFn
 
setConfiguration(Configuration) - Method in class org.apache.crunch.fn.PairMapFn
 
setConfiguration(Configuration) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
setConfiguration(Configuration) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
setConfiguration(Configuration) - Method in interface org.apache.crunch.Pipeline
Set the Configuration to use with this pipeline.
setContext(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.DoFn
Called during setup to pass the TaskInputOutputContext to this DoFn instance.
setContext(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.fn.CompositeMapFn
 
setContext(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.fn.ExtractKeyFn
 
setContext(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.fn.PairMapFn
 
setContext(TaskInputOutputContext<?, ?, ?, ?>) - Method in class org.apache.crunch.types.PGroupedTableType.PairIterableMapFn
 
setPartitionFile(Configuration, Path) - Static method in class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
setSpecificClassLoader(ClassLoader) - Static method in class org.apache.crunch.types.avro.AvroMode
 
setValue(long) - Method in class org.apache.hadoop.mapred.SparkCounter
 
Shard - Class in org.apache.crunch.lib
Utilities for controlling how the data in a PCollection is balanced across reducers and output files.
Shard() - Constructor for class org.apache.crunch.lib.Shard
 
shard(PCollection<T>, int) - Static method in class org.apache.crunch.lib.Shard
Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files.
ShardedJoinStrategy<K,U,V> - Class in org.apache.crunch.lib.join
JoinStrategy that splits the key space up into shards.
ShardedJoinStrategy(int) - Constructor for class org.apache.crunch.lib.join.ShardedJoinStrategy
Instantiate with a constant number of shards to use for all keys.
ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K>) - Constructor for class org.apache.crunch.lib.join.ShardedJoinStrategy
Instantiate with a custom sharding strategy.
ShardedJoinStrategy.ShardingStrategy<K> - Interface in org.apache.crunch.lib.join
Determines over how many shards a key will be split in a sharded join.
SingleUseIterable<T> - Class in org.apache.crunch.impl
Wrapper around a Reducer's input Iterable.
SingleUseIterable(Iterable<T>) - Constructor for class org.apache.crunch.impl.SingleUseIterable
Instantiate around an Iterable that may only be used once.
size() - Method in class org.apache.crunch.Pair
 
size() - Method in interface org.apache.crunch.Tuple
Returns the number of elements in this Tuple.
size() - Method in class org.apache.crunch.Tuple3
 
size() - Method in class org.apache.crunch.Tuple4
 
size() - Method in class org.apache.crunch.TupleN
 
size() - Method in class org.apache.crunch.types.writable.TupleWritable
The number of children in this Tuple.
skip(String) - Method in class org.apache.crunch.contrib.text.TokenizerFactory.Builder
Sets the regular expression that determines which input characters should be ignored by the Scanner that is returned by the constructed TokenizerFactory.
Sort - Class in org.apache.crunch.lib
Utilities for sorting PCollection instances.
Sort() - Constructor for class org.apache.crunch.lib.Sort
 
sort(PCollection<T>) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection using the natural ordering of its elements in ascending order.
sort(PCollection<T>, Sort.Order) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection using the natural order of its elements with the given Order.
sort(PCollection<T>, int, Sort.Order) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection using the natural ordering of its elements in the order specified using the given number of reducers.
sort(PTable<K, V>) - Static method in class org.apache.crunch.lib.Sort
Sorts the PTable using the natural ordering of its keys in ascending order.
sort(PTable<K, V>, Sort.Order) - Static method in class org.apache.crunch.lib.Sort
Sorts the PTable using the natural ordering of its keys with the given Order.
sort(PTable<K, V>, int, Sort.Order) - Static method in class org.apache.crunch.lib.Sort
Sorts the PTable using the natural ordering of its keys in the order specified with a client-specified number of reducers.
Sort.ColumnOrder - Class in org.apache.crunch.lib
To sort by column 2 ascending then column 1 descending, you would use: sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING)) Column numbering is 1-based.
Sort.ColumnOrder(int, Sort.Order) - Constructor for class org.apache.crunch.lib.Sort.ColumnOrder
 
Sort.Order - Enum in org.apache.crunch.lib
For signaling the order in which a sort should be done.
sortAndApply(PTable<K, Pair<V1, V2>>, DoFn<Pair<K, Iterable<Pair<V1, V2>>>, T>, PType<T>) - Static method in class org.apache.crunch.lib.SecondarySort
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>.
sortAndApply(PTable<K, Pair<V1, V2>>, DoFn<Pair<K, Iterable<Pair<V1, V2>>>, T>, PType<T>, int) - Static method in class org.apache.crunch.lib.SecondarySort
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>, using the given number of reducers.
sortAndApply(PTable<K, Pair<V1, V2>>, DoFn<Pair<K, Iterable<Pair<V1, V2>>>, Pair<U, V>>, PTableType<U, V>) - Static method in class org.apache.crunch.lib.SecondarySort
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PTable<U, V>.
sortAndApply(PTable<K, Pair<V1, V2>>, DoFn<Pair<K, Iterable<Pair<V1, V2>>>, Pair<U, V>>, PTableType<U, V>, int) - Static method in class org.apache.crunch.lib.SecondarySort
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PTable<U, V>, using the given number of reducers.
sortComparatorClass(Class<? extends RawComparator>) - Method in class org.apache.crunch.GroupingOptions.Builder
 
SortExample - Class in org.apache.crunch.examples
Simple Crunch tool for running sorting examples from the command line.
SortExample() - Constructor for class org.apache.crunch.examples.SortExample
 
SortFns - Class in org.apache.crunch.lib.sort
A set of DoFns that are used by Crunch's Sort library.
SortFns() - Constructor for class org.apache.crunch.lib.sort.SortFns
 
SortFns.AvroGenericFn<V extends Tuple> - Class in org.apache.crunch.lib.sort
Pulls a composite set of keys from an Avro GenericRecord instance.
SortFns.AvroGenericFn(int[], Schema) - Constructor for class org.apache.crunch.lib.sort.SortFns.AvroGenericFn
 
SortFns.KeyExtraction<V extends Tuple> - Class in org.apache.crunch.lib.sort
Utility class for encapsulating key extraction logic and serialization information about key extraction.
SortFns.KeyExtraction(PType<V>, Sort.ColumnOrder[]) - Constructor for class org.apache.crunch.lib.sort.SortFns.KeyExtraction
 
SortFns.SingleKeyFn<V extends Tuple,K> - Class in org.apache.crunch.lib.sort
Extracts a single indexed key from a Tuple instance.
SortFns.SingleKeyFn(int) - Constructor for class org.apache.crunch.lib.sort.SortFns.SingleKeyFn
 
SortFns.TupleKeyFn<V extends Tuple,K extends Tuple> - Class in org.apache.crunch.lib.sort
Extracts a composite key from a Tuple instance.
SortFns.TupleKeyFn(int[], TupleFactory) - Constructor for class org.apache.crunch.lib.sort.SortFns.TupleKeyFn
 
sortPairs(PCollection<Pair<U, V>>, Sort.ColumnOrder...) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection of Pairs using the specified column ordering.
sortQuads(PCollection<Tuple4<V1, V2, V3, V4>>, Sort.ColumnOrder...) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection of Tuple4s using the specified column ordering.
sortTriples(PCollection<Tuple3<V1, V2, V3>>, Sort.ColumnOrder...) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection of Tuple3s using the specified column ordering.
sortTuples(PCollection<T>, Sort.ColumnOrder...) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection of tuples using the specified column ordering.
sortTuples(PCollection<T>, int, Sort.ColumnOrder...) - Static method in class org.apache.crunch.lib.Sort
Sorts the PCollection of TupleNs using the specified column ordering and a client-specified number of reducers.
Source<T> - Interface in org.apache.crunch
A Source represents an input data set that is an input to one or more MapReduce jobs.
sources(Source<?>...) - Method in class org.apache.crunch.ParallelDoOptions.Builder
 
sources(Collection<Source<?>>) - Method in class org.apache.crunch.ParallelDoOptions.Builder
 
sourceTarget(SourceTarget<?>) - Method in class org.apache.crunch.GroupingOptions.Builder
Deprecated. 
SourceTarget<T> - Interface in org.apache.crunch
An interface for classes that implement both the Source and the Target interfaces.
SourceTargetHelper - Class in org.apache.crunch.io
Functions for configuring the inputs/outputs of MapReduce jobs.
SourceTargetHelper() - Constructor for class org.apache.crunch.io.SourceTargetHelper
 
sourceTargets(SourceTarget<?>...) - Method in class org.apache.crunch.GroupingOptions.Builder
 
sourceTargets(Collection<SourceTarget<?>>) - Method in class org.apache.crunch.GroupingOptions.Builder
 
sourceTargets(SourceTarget<?>...) - Method in class org.apache.crunch.ParallelDoOptions.Builder
 
sourceTargets(Collection<SourceTarget<?>>) - Method in class org.apache.crunch.ParallelDoOptions.Builder
 
SparkCollectFactory - Class in org.apache.crunch.impl.spark.collect
 
SparkCollectFactory() - Constructor for class org.apache.crunch.impl.spark.collect.SparkCollectFactory
 
SparkCollection - Interface in org.apache.crunch.impl.spark
 
SparkComparator - Class in org.apache.crunch.impl.spark
 
SparkComparator(GroupingOptions, PGroupedTableType, SparkRuntimeContext) - Constructor for class org.apache.crunch.impl.spark.SparkComparator
 
SparkCounter - Class in org.apache.hadoop.mapred
 
SparkCounter(String, String, Accumulator<Map<String, Map<String, Long>>>) - Constructor for class org.apache.hadoop.mapred.SparkCounter
 
SparkCounter(String, String, long) - Constructor for class org.apache.hadoop.mapred.SparkCounter
 
SparkPartitioner - Class in org.apache.crunch.impl.spark
 
SparkPartitioner(int) - Constructor for class org.apache.crunch.impl.spark.SparkPartitioner
 
SparkPipeline - Class in org.apache.crunch.impl.spark
 
SparkPipeline(String, String) - Constructor for class org.apache.crunch.impl.spark.SparkPipeline
 
SparkPipeline(String, String, Class<?>) - Constructor for class org.apache.crunch.impl.spark.SparkPipeline
 
SparkPipeline(JavaSparkContext, String) - Constructor for class org.apache.crunch.impl.spark.SparkPipeline
 
SparkRuntime - Class in org.apache.crunch.impl.spark
 
SparkRuntime(SparkPipeline, JavaSparkContext, Configuration, Map<PCollectionImpl<?>, Set<Target>>, Map<PCollectionImpl<?>, MaterializableIterable>, Map<PCollection<?>, StorageLevel>) - Constructor for class org.apache.crunch.impl.spark.SparkRuntime
 
SparkRuntimeContext - Class in org.apache.crunch.impl.spark
 
SparkRuntimeContext(Accumulator<Map<String, Map<String, Long>>>, Broadcast<byte[]>) - Constructor for class org.apache.crunch.impl.spark.SparkRuntimeContext
 
SPECIFIC - Static variable in class org.apache.crunch.types.avro.AvroMode
Default mode to use for reading and writing Specific types.
specifics(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
split(PCollection<Pair<T, U>>) - Static method in class org.apache.crunch.lib.Channels
Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
split(PCollection<Pair<T, U>>, PType<T>, PType<U>) - Static method in class org.apache.crunch.lib.Channels
Splits a PCollection of any Pair of objects into a Pair of PCollection}, to allow for the output of a DoFn to be handled using separate channels.
status - Variable in class org.apache.crunch.PipelineResult
 
STRING_CONCAT(String, boolean) - Static method in class org.apache.crunch.fn.Aggregators
Concatenate strings, with a separator between strings.
STRING_CONCAT(String, boolean, long, long) - Static method in class org.apache.crunch.fn.Aggregators
Concatenate strings, with a separator between strings.
STRING_TO_UTF8 - Static variable in class org.apache.crunch.types.avro.Avros
 
strings() - Static method in class org.apache.crunch.types.avro.Avros
 
strings() - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
strings() - Method in interface org.apache.crunch.types.PTypeFamily
 
strings() - Static method in class org.apache.crunch.types.writable.Writables
 
strings() - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
succeeded() - Method in class org.apache.crunch.PipelineResult
 
SUM_BIGINTS() - Static method in class org.apache.crunch.fn.Aggregators
Sum up all BigInteger values.
SUM_DOUBLES() - Static method in class org.apache.crunch.fn.Aggregators
Sum up all double values.
SUM_FLOATS() - Static method in class org.apache.crunch.fn.Aggregators
Sum up all float values.
SUM_INTS() - Static method in class org.apache.crunch.fn.Aggregators
Sum up all int values.
SUM_LONGS() - Static method in class org.apache.crunch.fn.Aggregators
Sum up all long values.

T

tableOf(S, T, Object...) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
tableOf(Iterable<Pair<S, T>>) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
tableOf(PType<K>, PType<V>) - Static method in class org.apache.crunch.types.avro.Avros
 
tableOf(PType<K>, PType<V>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
tableOf(PType<K>, PType<V>) - Method in interface org.apache.crunch.types.PTypeFamily
 
tableOf(PType<K>, PType<V>) - Static method in class org.apache.crunch.types.writable.Writables
 
tableOf(PType<K>, PType<V>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
TableSource<K,V> - Interface in org.apache.crunch
The interface Source implementations that return a PTable.
TableSourceTarget<K,V> - Interface in org.apache.crunch
An interface for classes that implement both the TableSource and the Target interfaces.
Target - Interface in org.apache.crunch
A Target represents the output destination of a Crunch PCollection in the context of a Crunch job.
Target.WriteMode - Enum in org.apache.crunch
An enum to represent different options the client may specify for handling the case where the output path, table, etc.
tempDir - Variable in class org.apache.crunch.test.CrunchTestSupport
 
TemporaryPath - Class in org.apache.crunch.test
Creates a temporary directory for a test case and destroys it afterwards.
TemporaryPath(String...) - Constructor for class org.apache.crunch.test.TemporaryPath
Construct TemporaryPath.
TestCounters - Class in org.apache.crunch.test
A utility class used during unit testing to update and read counters.
TestCounters() - Constructor for class org.apache.crunch.test.TestCounters
 
textFile(String) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<String> instance for the text file(s) at the given path name.
textFile(Path) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<String> instance for the text file(s) at the given Path.
textFile(String, PType<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.
textFile(Path, PType<T>) - Static method in class org.apache.crunch.io.At
Creates a SourceTarget<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.
textFile(String) - Static method in class org.apache.crunch.io.From
Creates a Source<String> instance for the text file(s) at the given path name.
textFile(Path) - Static method in class org.apache.crunch.io.From
Creates a Source<String> instance for the text file(s) at the given Path.
textFile(List<Path>) - Static method in class org.apache.crunch.io.From
Creates a Source<String> instance for the text file(s) at the given Paths.
textFile(String, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.
textFile(Path, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.
textFile(List<Path>, PType<T>) - Static method in class org.apache.crunch.io.From
Creates a Source<T> instance for the text file(s) at the given Paths using the provided PType<T> to convert the input text.
textFile(String) - Static method in class org.apache.crunch.io.To
Creates a Target at the given path name that writes data to text files.
textFile(Path) - Static method in class org.apache.crunch.io.To
Creates a Target at the given Path that writes data to text files.
third() - Method in class org.apache.crunch.Tuple3
 
third() - Method in class org.apache.crunch.Tuple4
 
thrifts(Class<T>, PTypeFamily) - Static method in class org.apache.crunch.types.PTypes
Constructs a PType for a Thrift record.
To - Class in org.apache.crunch.io
Static factory methods for creating common Target types.
To() - Constructor for class org.apache.crunch.io.To
 
ToByteArrayFunction - Class in org.apache.crunch.impl.spark.collect
 
ToByteArrayFunction() - Constructor for class org.apache.crunch.impl.spark.collect.ToByteArrayFunction
 
toBytes(T) - Method in class org.apache.crunch.impl.spark.serde.AvroSerDe
 
toBytes(T) - Method in interface org.apache.crunch.impl.spark.serde.SerDe
 
toBytes(Writable) - Method in class org.apache.crunch.impl.spark.serde.WritableSerDe
 
toCombineFn(Aggregator<V>) - Static method in class org.apache.crunch.fn.Aggregators
Wrap a CombineFn adapter around the given aggregator.
Tokenizer - Class in org.apache.crunch.contrib.text
Manages a Scanner instance and provides support for returning only a subset of the fields returned by the underlying Scanner.
Tokenizer(Scanner, Set<Integer>, boolean) - Constructor for class org.apache.crunch.contrib.text.Tokenizer
Create a new Tokenizer instance.
TokenizerFactory - Class in org.apache.crunch.contrib.text
Factory class that constructs Tokenizer instances for input strings that use a fixed set of delimiters, skip patterns, locales, and sets of indices to keep or drop.
TokenizerFactory.Builder - Class in org.apache.crunch.contrib.text
A class for constructing new TokenizerFactory instances using the Builder pattern.
TokenizerFactory.Builder() - Constructor for class org.apache.crunch.contrib.text.TokenizerFactory.Builder
 
top(int) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
top(PTable<K, V>, int, boolean) - Static method in class org.apache.crunch.lib.Aggregate
Selects the top N pairs from the given table, with sorting being performed on the values (i.e.
top(int) - Method in interface org.apache.crunch.PTable
Returns a PTable made up of the pairs in this PTable with the largest value field.
toString() - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
toString() - Method in class org.apache.crunch.lib.Sort.ColumnOrder
 
toString() - Method in class org.apache.crunch.Pair
 
toString() - Method in class org.apache.crunch.Tuple3
 
toString() - Method in class org.apache.crunch.Tuple4
 
toString() - Method in class org.apache.crunch.TupleN
 
toString() - Method in class org.apache.crunch.types.writable.TupleWritable
Convert Tuple to String as in the following.
TotalBytesByIP - Class in org.apache.crunch.examples
 
TotalBytesByIP() - Constructor for class org.apache.crunch.examples.TotalBytesByIP
 
TotalOrderPartitioner<K,V> - Class in org.apache.crunch.lib.sort
A partition-aware Partitioner instance that can work with either Avro or Writable-formatted keys.
TotalOrderPartitioner() - Constructor for class org.apache.crunch.lib.sort.TotalOrderPartitioner
 
TotalWordCount - Class in org.apache.crunch.examples
 
TotalWordCount() - Constructor for class org.apache.crunch.examples.TotalWordCount
 
tripAggregator(Aggregator<V1>, Aggregator<V2>, Aggregator<V3>) - Static method in class org.apache.crunch.fn.Aggregators
Apply separate aggregators to each component of a Tuple3.
triples(PType<V1>, PType<V2>, PType<V3>) - Static method in class org.apache.crunch.types.avro.Avros
 
triples(PType<V1>, PType<V2>, PType<V3>) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
triples(PType<V1>, PType<V2>, PType<V3>) - Method in interface org.apache.crunch.types.PTypeFamily
 
triples(PType<V1>, PType<V2>, PType<V3>) - Static method in class org.apache.crunch.types.writable.Writables
 
triples(PType<V1>, PType<V2>, PType<V3>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
Tuple - Interface in org.apache.crunch
A fixed-size collection of Objects, used in Crunch for representing joins between PCollections.
tuple2PairFunc() - Static method in class org.apache.crunch.impl.spark.GuavaUtils
 
Tuple3<V1,V2,V3> - Class in org.apache.crunch
A convenience class for three-element Tuples.
Tuple3(V1, V2, V3) - Constructor for class org.apache.crunch.Tuple3
 
TUPLE3 - Static variable in class org.apache.crunch.types.TupleFactory
 
Tuple3.Collect<V1,V2,V3> - Class in org.apache.crunch
 
Tuple3.Collect(Collection<V1>, Collection<V2>, Collection<V3>) - Constructor for class org.apache.crunch.Tuple3.Collect
 
Tuple4<V1,V2,V3,V4> - Class in org.apache.crunch
A convenience class for four-element Tuples.
Tuple4(V1, V2, V3, V4) - Constructor for class org.apache.crunch.Tuple4
 
TUPLE4 - Static variable in class org.apache.crunch.types.TupleFactory
 
Tuple4.Collect<V1,V2,V3,V4> - Class in org.apache.crunch
 
Tuple4.Collect(Collection<V1>, Collection<V2>, Collection<V3>, Collection<V4>) - Constructor for class org.apache.crunch.Tuple4.Collect
 
tupleAggregator(Aggregator<?>...) - Static method in class org.apache.crunch.fn.Aggregators
Apply separate aggregators to each component of a Tuple.
TupleDeepCopier<T extends Tuple> - Class in org.apache.crunch.types
Performs deep copies (based on underlying PType deep copying) of Tuple-based objects.
TupleDeepCopier(Class<T>, PType...) - Constructor for class org.apache.crunch.types.TupleDeepCopier
 
TupleFactory<T extends Tuple> - Class in org.apache.crunch.types
 
TupleFactory() - Constructor for class org.apache.crunch.types.TupleFactory
 
TupleN - Class in org.apache.crunch
A Tuple instance for an arbitrary number of values.
TupleN(Object...) - Constructor for class org.apache.crunch.TupleN
 
TUPLEN - Static variable in class org.apache.crunch.types.TupleFactory
 
tuples(PType...) - Static method in class org.apache.crunch.types.avro.Avros
 
tuples(Class<T>, PType...) - Static method in class org.apache.crunch.types.avro.Avros
 
tuples(PType<?>...) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
tuples(Class<T>, PType<?>...) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
tuples(PType<?>...) - Method in interface org.apache.crunch.types.PTypeFamily
 
tuples(Class<T>, PType<?>...) - Method in interface org.apache.crunch.types.PTypeFamily
 
tuples(PType...) - Static method in class org.apache.crunch.types.writable.Writables
 
tuples(Class<T>, PType...) - Static method in class org.apache.crunch.types.writable.Writables
 
tuples(PType<?>...) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
tuples(Class<T>, PType<?>...) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
Tuples - Class in org.apache.crunch.util
Utilities for working with subclasses of the Tuple interface.
Tuples() - Constructor for class org.apache.crunch.util.Tuples
 
Tuples.PairIterable<S,T> - Class in org.apache.crunch.util
 
Tuples.PairIterable(Iterable<S>, Iterable<T>) - Constructor for class org.apache.crunch.util.Tuples.PairIterable
 
Tuples.QuadIterable<A,B,C,D> - Class in org.apache.crunch.util
 
Tuples.QuadIterable(Iterable<A>, Iterable<B>, Iterable<C>, Iterable<D>) - Constructor for class org.apache.crunch.util.Tuples.QuadIterable
 
Tuples.TripIterable<A,B,C> - Class in org.apache.crunch.util
 
Tuples.TripIterable(Iterable<A>, Iterable<B>, Iterable<C>) - Constructor for class org.apache.crunch.util.Tuples.TripIterable
 
Tuples.TupleNIterable - Class in org.apache.crunch.util
 
Tuples.TupleNIterable(Iterable<?>...) - Constructor for class org.apache.crunch.util.Tuples.TupleNIterable
 
TupleWritable - Class in org.apache.crunch.types.writable
A serialization format for Tuple.
TupleWritable() - Constructor for class org.apache.crunch.types.writable.TupleWritable
Create an empty tuple with no allocated storage for writables.
TupleWritable(Writable[]) - Constructor for class org.apache.crunch.types.writable.TupleWritable
 
TupleWritable(Writable[], int[]) - Constructor for class org.apache.crunch.types.writable.TupleWritable
Initialize tuple with storage; unknown whether any of them contain "written" values.
TupleWritable.Comparator - Class in org.apache.crunch.types.writable
 
TupleWritableComparator - Class in org.apache.crunch.lib.sort
 
TupleWritableComparator() - Constructor for class org.apache.crunch.lib.sort.TupleWritableComparator
 
typedCollectionOf(PType<T>, T...) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
typedCollectionOf(PType<T>, Iterable<T>) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
typedTableOf(PTableType<S, T>, S, T, Object...) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 
typedTableOf(PTableType<S, T>, Iterable<Pair<S, T>>) - Static method in class org.apache.crunch.impl.mem.MemPipeline
 

U

ungroup() - Method in class org.apache.crunch.impl.dist.collect.BaseGroupedTable
 
ungroup() - Method in interface org.apache.crunch.PGroupedTable
Convert this grouping back into a multimap.
union(PCollection<S>) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
union(PCollection<S>...) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
union(PTable<K, V>) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
union(PTable<K, V>...) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
union(PCollection<S>) - Method in interface org.apache.crunch.PCollection
Returns a PCollection instance that acts as the union of this PCollection and the given PCollection.
union(PCollection<S>...) - Method in interface org.apache.crunch.PCollection
Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.
union(PTable<K, V>) - Method in interface org.apache.crunch.PTable
Returns a PTable instance that acts as the union of this PTable and the other PTables.
union(PTable<K, V>...) - Method in interface org.apache.crunch.PTable
Returns a PTable instance that acts as the union of this PTable and the input PTables.
Union - Class in org.apache.crunch
Allows us to represent the combination of multiple data sources that may contain different types of data as a single type with an index to indicate which of the original sources the current record was from.
Union(int, Object) - Constructor for class org.apache.crunch.Union
 
UnionCollection<S> - Class in org.apache.crunch.impl.spark.collect
 
UnionDeepCopier - Class in org.apache.crunch.types
 
UnionDeepCopier(PType...) - Constructor for class org.apache.crunch.types.UnionDeepCopier
 
unionOf(PType<?>...) - Static method in class org.apache.crunch.types.avro.Avros
 
unionOf(PType<?>...) - Method in class org.apache.crunch.types.avro.AvroTypeFamily
 
unionOf(PType<?>...) - Method in interface org.apache.crunch.types.PTypeFamily
 
unionOf(PType<?>...) - Static method in class org.apache.crunch.types.writable.Writables
 
unionOf(PType<?>...) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
UnionReadableData<T> - Class in org.apache.crunch.util
 
UnionReadableData(List<ReadableData<T>>) - Constructor for class org.apache.crunch.util.UnionReadableData
 
UnionTable<K,V> - Class in org.apache.crunch.impl.spark.collect
 
UnionWritable - Class in org.apache.crunch.types.writable
 
UnionWritable() - Constructor for class org.apache.crunch.types.writable.UnionWritable
 
UnionWritable(int, BytesWritable) - Constructor for class org.apache.crunch.types.writable.UnionWritable
 
UNIQUE_ELEMENTS() - Static method in class org.apache.crunch.fn.Aggregators
Collect the unique elements of the input, as defined by the equals method for the input objects.
update(T) - Method in interface org.apache.crunch.Aggregator
Incorporate the given value into the aggregate state maintained by this instance.
useDisk(boolean) - Method in class org.apache.crunch.CachingOptions.Builder
 
useDisk() - Method in class org.apache.crunch.CachingOptions
Whether the framework may cache data on disk.
useMemory(boolean) - Method in class org.apache.crunch.CachingOptions.Builder
 
useMemory() - Method in class org.apache.crunch.CachingOptions
Whether the framework may cache data in memory without writing it to disk.
UTF8_TO_STRING - Static variable in class org.apache.crunch.types.avro.Avros
 
uuid(PTypeFamily) - Static method in class org.apache.crunch.types.PTypes
A PType for Java's UUID type.

V

value - Variable in class org.apache.crunch.impl.spark.ByteArray
 
valueOf(String) - Static method in enum org.apache.crunch.impl.mr.MRJob.State
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.apache.crunch.lib.join.JoinType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.apache.crunch.lib.Sort.Order
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.apache.crunch.PipelineExecution.Status
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.apache.crunch.Target.WriteMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.apache.crunch.types.avro.AvroMode.ModeType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.apache.crunch.types.avro.AvroType.AvroRecordType
Returns the enum constant of this type with the specified name.
values() - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
values() - Static method in enum org.apache.crunch.impl.mr.MRJob.State
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.apache.crunch.lib.join.JoinType
Returns an array containing the constants of this enum type, in the order they are declared.
values(PTable<K, V>) - Static method in class org.apache.crunch.lib.PTables
Extract the values from the given PTable<K, V> as a PCollection<V>.
values() - Static method in enum org.apache.crunch.lib.Sort.Order
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.apache.crunch.PipelineExecution.Status
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Method in interface org.apache.crunch.PTable
Returns a PCollection made up of the values in this PTable.
values() - Static method in enum org.apache.crunch.Target.WriteMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.apache.crunch.types.avro.AvroMode.ModeType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.apache.crunch.types.avro.AvroType.AvroRecordType
Returns an array containing the constants of this enum type, in the order they are declared.
visitDoCollection(BaseDoCollection<?>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionImpl.Visitor
 
visitDoTable(BaseDoTable<?, ?>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionImpl.Visitor
 
visitGroupedTable(BaseGroupedTable<?, ?>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionImpl.Visitor
 
visitInputCollection(BaseInputCollection<?>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionImpl.Visitor
 
visitUnionCollection(BaseUnionCollection<?>) - Method in interface org.apache.crunch.impl.dist.collect.PCollectionImpl.Visitor
 

W

waitFor(long, TimeUnit) - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
waitFor(long, TimeUnit) - Method in interface org.apache.crunch.PipelineExecution
Blocks until pipeline completes or the specified waiting time elapsed.
waitUntilDone() - Method in class org.apache.crunch.impl.spark.SparkRuntime
 
waitUntilDone() - Method in interface org.apache.crunch.PipelineExecution
Blocks until pipeline completes, i.e.
wasLogged() - Method in exception org.apache.crunch.CrunchRuntimeException
Returns true if this exception was written to the debug logs.
weightedReservoirSample(PCollection<Pair<T, N>>, int) - Static method in class org.apache.crunch.lib.Sample
Selects a weighted sample of the elements of the given PCollection, where the second term in the input Pair is a numerical weight.
weightedReservoirSample(PCollection<Pair<T, N>>, int, Long) - Static method in class org.apache.crunch.lib.Sample
The weighted reservoir sampling function with the seed term exposed for testing purposes.
withFactory(ReaderWriterFactory) - Method in class org.apache.crunch.types.avro.AvroMode
Creates a new AvroMode instance which will utilize the factory instance for creating Avro readers and writers.
withFactoryFromConfiguration(Configuration) - Method in class org.apache.crunch.types.avro.AvroMode
 
WordAggregationHBase - Class in org.apache.crunch.examples
You need to have a HBase instance running.
WordAggregationHBase() - Constructor for class org.apache.crunch.examples.WordAggregationHBase
 
WordCount - Class in org.apache.crunch.examples
 
WordCount() - Constructor for class org.apache.crunch.examples.WordCount
 
WritableDeepCopier<T extends org.apache.hadoop.io.Writable> - Class in org.apache.crunch.types.writable
Performs deep copies of Writable values.
WritableDeepCopier(Class<T>) - Constructor for class org.apache.crunch.types.writable.WritableDeepCopier
 
writables(Class<T>) - Static method in class org.apache.crunch.types.avro.Avros
 
Writables - Class in org.apache.crunch.types.writable
Defines static methods that are analogous to the methods defined in WritableTypeFamily for convenient static importing.
writables(Class<W>) - Static method in class org.apache.crunch.types.writable.Writables
 
writables(Class<W>) - Method in class org.apache.crunch.types.writable.WritableTypeFamily
 
WritableSerDe - Class in org.apache.crunch.impl.spark.serde
 
WritableSerDe(Class<? extends Writable>) - Constructor for class org.apache.crunch.impl.spark.serde.WritableSerDe
 
WritableType<T,W extends org.apache.hadoop.io.Writable> - Class in org.apache.crunch.types.writable
 
WritableType(Class<T>, Class<W>, MapFn<W, T>, MapFn<T, W>, PType...) - Constructor for class org.apache.crunch.types.writable.WritableType
 
WritableTypeFamily - Class in org.apache.crunch.types.writable
The Writable-based implementation of the PTypeFamily interface.
write(DataOutput) - Method in class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
write(PreparedStatement) - Method in class org.apache.crunch.contrib.io.jdbc.IdentifiableName
 
write(Target) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
write(Target, Target.WriteMode) - Method in class org.apache.crunch.impl.dist.collect.PCollectionImpl
 
write(Target) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
write(Target, Target.WriteMode) - Method in class org.apache.crunch.impl.dist.collect.PTableBase
 
write(PCollection<?>, Target) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
write(PCollection<?>, Target, Target.WriteMode) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
write(PCollection<?>, Target) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
write(PCollection<?>, Target, Target.WriteMode) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
write(String, K, V) - Method in class org.apache.crunch.io.CrunchOutputs
 
write(DataOutput) - Method in class org.apache.crunch.io.FormatBundle
 
write(Target) - Method in interface org.apache.crunch.PCollection
Write the contents of this PCollection to the given Target, using the storage format specified by the target.
write(Target, Target.WriteMode) - Method in interface org.apache.crunch.PCollection
Write the contents of this PCollection to the given Target, using the given Target.WriteMode to handle existing targets.
write(PCollection<?>, Target) - Method in interface org.apache.crunch.Pipeline
Write the given collection to the given target on the next pipeline run.
write(PCollection<?>, Target, Target.WriteMode) - Method in interface org.apache.crunch.Pipeline
Write the contents of the PCollection to the given Target, using the storage format specified by the target and the given WriteMode for cases where the referenced Target already exists.
write(Target) - Method in interface org.apache.crunch.PTable
Writes this PTable to the given Target.
write(Target, Target.WriteMode) - Method in interface org.apache.crunch.PTable
Writes this PTable to the given Target, using the given Target.WriteMode to handle existing targets.
write(DataOutput) - Method in class org.apache.crunch.types.writable.TupleWritable
Writes each Writable to out.
write(DataOutput) - Method in class org.apache.crunch.types.writable.UnionWritable
 
write(PCollection<?>, Target) - Method in class org.apache.crunch.util.CrunchTool
 
write(Configuration, Path, Object) - Static method in class org.apache.crunch.util.DistCache
 
writeTextFile(PCollection<T>, String) - Method in class org.apache.crunch.impl.dist.DistributedPipeline
 
writeTextFile(PCollection<T>, String) - Method in class org.apache.crunch.impl.mem.MemPipeline
 
writeTextFile(PCollection<T>, String) - Method in interface org.apache.crunch.Pipeline
A convenience method for writing a text file.
writeTextFile(PCollection<?>, String) - Method in class org.apache.crunch.util.CrunchTool
 

X

xboolean() - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for booleans.
xboolean(Boolean) - Static method in class org.apache.crunch.contrib.text.Extractors
 
xcollect(TokenizerFactory, Extractor<T>) - Static method in class org.apache.crunch.contrib.text.Extractors
 
xcustom(Class<T>, TokenizerFactory, Extractor...) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for a subclass of Tuple with a constructor that has the given extractor types that uses the given TokenizerFactory for parsing the sub-fields.
xdouble() - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for doubles.
xdouble(Double) - Static method in class org.apache.crunch.contrib.text.Extractors
 
xfloat() - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for floats.
xfloat(Float) - Static method in class org.apache.crunch.contrib.text.Extractors
 
xint() - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for integers.
xint(Integer) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for integers.
xlong() - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for longs.
xlong(Long) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for longs.
xpair(TokenizerFactory, Extractor<K>, Extractor<V>) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for pairs of the given types that uses the given TokenizerFactory for parsing the sub-fields.
xquad(TokenizerFactory, Extractor<A>, Extractor<B>, Extractor<C>, Extractor<D>) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for quads of the given types that uses the given TokenizerFactory for parsing the sub-fields.
xstring() - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for strings.
xstring(String) - Static method in class org.apache.crunch.contrib.text.Extractors
 
xtriple(TokenizerFactory, Extractor<A>, Extractor<B>, Extractor<C>) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for triples of the given types that uses the given TokenizerFactory for parsing the sub-fields.
xtupleN(TokenizerFactory, Extractor...) - Static method in class org.apache.crunch.contrib.text.Extractors
Returns an Extractor for an arbitrary number of types that uses the given TokenizerFactory for parsing the sub-fields.

Z

zero(Map<String, Map<String, Long>>) - Method in class org.apache.crunch.impl.spark.CounterAccumulatorParam
 

A B C D E F G H I J K L M N O P Q R S T U V W X Z

Copyright © 2014 The Apache Software Foundation. All Rights Reserved.