| 
 | |||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||
Extractor instances that delegates the parsing of fields to other
 Extractor instances, primarily used for constructing composite records that implement
 the Tuple interface.Extractor instances that construct a single
 object from a block of text stored in a String, with support for error handling
 and reporting.Target instance is compatible with the
 given PType.
PCollection instances.PCollection that contains the result of aggregating all values in this instance.
Aggregators.Converter instance
 before (for outputs) or after (for inputs) using the associated PType#getInputMapFn
 and PType#getOutputMapFn calls.
PObject encapsulating a Map made up of the keys and values in this
 PTable.
PObject encapsulating a Map made up of the keys and values in this
 PTable.
PCollection<Pair<K, V>> to a PTable<K, V>.
SourceTarget type that corresponds to this Target
 for the given PType, if possible.
SourceTarget types, which may be treated as both a Source
 and a Target.SourceTarget<T> instance from the Avro file(s) at the given path name.
SourceTarget<T> instance from the Avro file(s) at the given Path.
SourceTarget<GenericData.Record> by reading the schema of the Avro file
 at the given path.
SourceTarget<GenericData.Record> by reading the schema of the Avro file
 at the given path.
SourceTarget<GenericData.Record> by reading the schema of the Avro file
 at the given path using the FileSystem information contained in the given
 Configuration instance.
SourceTarget<T> instance from the Avro file(s) at the given path name.
SourceTarget<T> instance from the Avro file(s) at the given Path.
Source<T> instance from the Avro file(s) at the given path name.
Source<T> instance from the Avro file(s) at the given Path.
Source<T> instance from the Avro file(s) at the given Paths.
Source<T> instance from the Avro file(s) at the given path name.
Source<T> instance from the Avro file(s) at the given Path.
Source<T> instance from the Avro file(s) at the given Paths.
Source<GenericData.Record> by reading the schema of the Avro file
 at the given path.
Source<GenericData.Record> by reading the schema of the Avro file
 at the given path.
Source<GenericData.Record> by reading the schema of the Avro file
 at the given paths.
Source<GenericData.Record> by reading the schema of the Avro file
 at the given path using the FileSystem information contained in the given
 Configuration instance.
Source<GenericData.Record> by reading the schema of the Avro file
 at the given paths using the FileSystem information contained in the given
 Configuration instance.
Target at the given path name that writes data to
 Avro files.
Target at the given Path that writes data to
 Avro files.
InputFormat for Avro data files.OutputFormat for Avro data files.FileOutputFormat that takes in a Utf8 and an Avro record and writes the Avro records to
 a sub-directory of the output path whose name is equal to the string-form of the Utf8.AvroTypeFamily for convenient static importing.TableSource<K,V> for reading an Avro key/value file at the given path.
TableSource<K,V> for reading an Avro key/value file at the given paths.
InputFormat for text files.BigInteger type.
TokenizerFactory with settings determined by this
 Builder instance.
CachingOptions.Builder instance to use for specifying the caching options for a particular
 PCollection<T>.
TokenizerFactory.Builder instance.
PTable.
PTable.
CachingOptions.
CachingOptions.
PCollection<T> is cached for subsequent processing.CachingOptions for a PCollection.PTable or PCollection
 instances.Pair instances emitted by DoFn into
 separate PCollection instances.DoFn is
 associated with.
FilterFn is
 associated with.
DoFn is
 associated with.
DoFn is
 associated with.
running the pipeline.
PTable arguments.
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
 reducers.)
PTable arguments.
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
 reducers.)
PTable arguments.
PTable arguments with a user-specified degree of parallelism (a.k.a, number of
 reducers.)
PTable arguments.
PTable arguments with a user-specified degree of parallelism
 (a.k.a, number of reducers.) The largest table should come last in the ordering.
DoFn implementation that converts an Iterable of
 values into a single value.CombineFn.
CombineFn instances.
Aggregator.
Aggregator instances.
comm utility.
Configuration object associated with the
 Job that includes these options.
Configuration instance(s) that are used to
 read and write this SourceTarget<T>.
Configuration
 object.
bundle with mode specific settings for the specific FormatBundle.
conf with mode specific settings.
AvroMode.configure(org.apache.hadoop.conf.Configuration)
conf with mode specific settings for use during the shuffle phase.
DoFn, or takes the output of a DoFn and write it to the
 output key/values.File.
Path.
PTable that contains the unique elements of this collection mapped to a count
 of their occurrences.
PTable that contains the unique elements of this collection mapped to a count
 of their occurrences.
PTable instance that contains the counts of each unique
 element of this PCollection.
Scanner instance that wraps the input string and uses the delimiter,
 skip, and locale settings for this TokenizerFactory instance.
MapsideJoinStrategy instance that will load its left-side table into memory,
 and will materialize the contents of the left-side table to disk before running the in-memory join.
MapsideJoinStrategy instance that will load its left-side table into memory.
PType<S> that respects the given column
 orderings.
PTables (using the same
 strategy as Pig's CROSS operator).
PTables (using the same
 strategy as Pig's CROSS operator).
PCollections (using the
 same strategy as Pig's CROSS operator).
PCollections (using the
 same strategy as Pig's CROSS operator).
InputFormat instances within a single
 Crunch MapReduce job.CrunchInputs for handling multiple OutputFormat instances
 writing to multiple files within a single MapReduce job.RuntimeException implementation that includes some additional options
 for the Crunch execution engine to track reporting status.Tool interface that creates a Pipeline
 instance and provides methods for working with the Pipeline from inside of
 the Tool's run method.CachingOptions with the default caching settings.
ReadableData<T> interface by delegating to an ReadableData<S> instance
 and passing its contents through a DoFn<S, T>.TokenizerFactory instances constructed by
 this instance.
Target exists before this instance may be
 executed.
PCollection be materialized to disk before this instance may be
 executed.
PCollection.PCollection that contains the unique elements of a
 given input PCollection.
PTable<K, V> analogue of the distinct function.
distinct operation that gives the client more control over how frequently
 elements are flushed to disk in order to allow control over performance or
 memory consumption.
PTable<K, V> analogue of the distinct function.
Iterator<T> that combines a delegate Iterator<S> and a DoFn<S, T>, generating
 data by passing the contents of the iterator through the function.run.
DoFn.PCollection of the given PType.
PTable of the given PTable Type.
Level.
Appender at the specified Level.
Enum type.
extract on this instance
 threw an exception that was handled.
MapFn into a key-value pair that is
 used to convert from a PCollection<V> to a PTable<K, V>.Scanner object.Extractor types.Extractor encountered when parsing
 input data.PCollection.
PCollection.
PTable.
PTable.
DoFn for the common case of filtering the members of a
 PCollection based on a boolean condition.FilterFn implementations.Counter class changed incompatibly between Hadoop 1 and 2
 (from a class to an interface) so user programs should avoid this method and use
 PipelineResult.StageResult.getCounterValue(Enum) and/or PipelineResult.StageResult.getCounterDisplayName(Enum).
n values (or fewer if there are fewer values than n).
InputFormat or OutputFormat and any extra 
 configuration information that format class needs to run.TableSource<K, V> for reading data from files that have custom
 FileInputFormat<K, V> implementations not covered by the provided TableSource
 and Source factory methods.
TableSource<K, V> for reading data from files that have custom
 FileInputFormat<K, V> implementations not covered by the provided TableSource
 and Source factory methods.
TableSource<K, V> for reading data from files that have custom
 FileInputFormat<K, V> implementations not covered by the provided TableSource
 and Source factory methods.
TableSource<K, V> for reading data from files that have custom
 FileInputFormat implementations not covered by the provided TableSource
 and Source factory methods.
TableSource<K, V> for reading data from files that have custom
 FileInputFormat implementations not covered by the provided TableSource
 and Source factory methods.
TableSource<K, V> for reading data from files that have custom
 FileInputFormat implementations not covered by the provided TableSource
 and Source factory methods.
Target at the given path name that writes data to
 a custom FileOutputFormat.
Target at the given Path that writes data to
 a custom FileOutputFormat.
Source types.AvroMode.AVRO_MODE_PROPERTY property in the conf.
AvroMode.AVRO_SHUFFLE_MODE_PROPERTY property in the conf.
AvroMode based upon the specified type.
PTables.
Pipeline when this instance is registered with Pipeline#sequentialDo.
Generic types.
Configuration instance associated with this pipeline.
Converter used for mapping the inputs from this instance
 into PCollection or PTable values.
Converter to use for mapping from the output PCollection
 into the output values expected by this instance.
Counter class changed incompatibly between Hadoop 1 and 2
 (from a class to an interface) so user programs should avoid this method and use
 PipelineResult.StageResult.getCounterNames().
GenericData instance based on the mode type.
SourceTarget that is able to read/write data using the serialization format
 specified by this PType.
TokenizerFactory that uses whitespace as a delimiter and does
 not skip any input fields.
Extractor in case of an
 error.
Pair.
PTypeFamily that this PType belongs to.
Pair or TupleN.
File below the temporary directory.
PGroupedTable value.
PGroupedTableType containing serialization information for
 this PGroupedTable.
PType of the key.
Source was most recently
 modified (e.g., because an input file was edited or new files were added to
 a directory.)
PCollection.
Configuration instance needs to enable
 this AvroMode as a serializable map of key-value pairs.
Path below the temporary directory.
Pipeline associated with this PCollection.
PTableType of this PTable.
PType associated with this data type for the
 given PTypeFamily.
PType of this PCollection.
DatumReader based on the schema.
null.
Path.
PCollection in
 bytes.
Source.
ClassLoader to be used for loading Avro org.apache.specific.SpecificRecord
 and reflection implementation classes.
Extractor instance
 encountered while parsing input data.
TupleFactory for a given Tuple implementation.
PType for this source.
PType.
PTypeFamily of this PCollection.
PObject.
PType of the value.
DatumWriter based on the schema.
GroupingOptions to control how the grouping is executed.
groupByKey operation in order to
 exercise finer control over how the partitioning, grouping, and sorting of
 keys is performed.GroupingOptions instances.WriteMode to this Target instance.
Scanner has any tokens remaining.
Extractor during the
 start of a map or reduce task.
PTables.
Configuration instance that is used to read
 this Source<T></T>.
PTable instances based on a common
 lastKey.PTable instances using a user-specified JoinFn.
PTables.
DoFn for performing joins.ObjectMapper.
PTable<K, V> as a PCollection<K>.
PCollection made up of the keys in this PTable.
n values (or fewer if there are fewer values than n).
PTables.
PCollection.
Locale to use with the TokenizerFactory returned by
 this Builder instance.
DoFn for the common case of emitting exactly one value for each
 input record.PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on
 the keys of the PTable.
PTable<K1, V> to a PTable<K2, V> using the given MapFn<K1, K2> on
 the keys of the PTable.
PTable that has the same values as this instance, but
 uses the given function to map the keys.
PTable that has the same values as this instance, but
 uses the given function to map the keys.
PTables.MapsideJoinStrategy.create() factory method instead
MapsideJoinStrategy.create(boolean) factory method instead
PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on
 the values of the PTable.
PTable<K, U> to a PTable<K, V> using the given MapFn<U, V> on
 the values of the PTable.
mapValues function for PGroupedTable<K, U> collections.
mapValues function for PGroupedTable<K, U> collections.
Iterable<V> elements of each record to a new type.
Iterable<V> elements of each record to a new type.
PTable that has the same keys as this instance, but
 uses the given function to map the values.
PTable that has the same keys as this instance, but
 uses the given function to map the values.
PObject of the maximum element of this instance.
BigInteger values.
n largest BigInteger values (or fewer if there are fewer
 values than n).
double values.
n largest double values (or fewer if there are fewer
 values than n).
float values.
n largest float values (or fewer if there are fewer
 values than n).
int values.
n largest int values (or fewer if there are fewer
 values than n).
long values.
n largest long values (or fewer if there are fewer
 values than n).
n largest values (or fewer if there are fewer
 values than n).
PObject of the minimum element of this instance.
BigInteger values.
n smallest BigInteger values (or fewer if there are fewer
 values than n).
double values.
n smallest double values (or fewer if there are fewer
 values than n).
float values.
n smallest float values (or fewer if there are fewer
 values than n).
int values.
n smallest int values (or fewer if there are fewer
 values than n).
long values.
n smallest long values (or fewer if there are fewer
 values than n).
n smallest values (or fewer if there are fewer
 values than n).
Tokenizer and return the next String from the Scanner.
Tokenizer and return the next Boolean from the Scanner.
Tokenizer and return the next Double from the Scanner.
Tokenizer and return the next Float from the Scanner.
Tokenizer and return the next Integer from the Scanner.
Tokenizer and return the next Long from the Scanner.
DeepCopier that does nothing, and just returns the input value without copying anything.Configuration instance that is used to write
 this Target.
AvroMode.withFactory(ReaderWriterFactory) instead.
Tuples.Pair.
PCollection and
 returns a new PCollection that is the output of this processing.
PCollection and
 returns a new PCollection that is the output of this processing.
PCollection and
 returns a new PCollection that is the output of this processing.
parallelDo instance, but returns a
 PTable instance instead of a PCollection.
parallelDo instance, but returns a
 PTable instance instead of a PCollection.
parallelDo instance, but returns a
 PTable instance instead of a PCollection.
parallelDo operation
 applied to a PCollection.PCollection<String> into PCollection's of strongly-typed
 tuples.PCollection<String> and returns a PCollection<T> using
 the given Extractor<T>.
PCollection<String> and returns a PCollection<T> using
 the given Extractor<T> that uses the given PTypeFamily.
PCollection<String> and returns a PTable<K, V> using
 the given Extractor<Pair<K, V>>.
PCollection<String> and returns a PTable<K, V> using
 the given Extractor<Pair<K, V>> that uses the given PTypeFamily.
PTable, which corresponds to the output of
 the shuffle phase of a MapReduce job.PType instance for PGroupedTable instances.Callable that executes some sequential logic on the client machine as
 part of an overall Crunch pipeline in order to generate zero or more outputs, some of
 which may be PCollection instances that are processed by other jobs in the
 pipeline.run or done on the
 Pipeline interface that includes details and statistics about the component
 stages of the data pipeline.PObject represents a singleton object value that results from a distributed
 computation.PCollection.
SerializableSupplier to provide
 an ExtensionRegistry to use in reading the given protobuf.
PCollection that represents an immutable,
 distributed multi-map of keys and values.PType specifically for PTable objects.PType defines a mapping between a data type that is used in a Crunch pipeline and a
 serialization and storage format that is used to read/write data from/to HDFS.PType instances that have the same
 serialization/storage backing format.PTypes from different
 PTypeFamily implementations.Tuple4.
Iterable that contains the contents of this source.
Source into a PCollection that is
 available to jobs run using this Pipeline instance.
TableSource instances that map to
 PTables.
Source interface that indicates that a
 Source instance may be read as a series of records by the client
 code.SourceTarget instance can be read
 into the local client.Reflect types.
WritableComparable class so that it can be used for comparing the fields inside of
 tuple types (e.g., pairs, trips, tupleN, etc.) for use in sorts and
 secondary sorts.
WritableComparable class with a given integer code to use for serializing
 and deserializing instances of this class that are defined inside of tuple types (e.g., pairs,
 trips, tupleN, etc.) Unregistered Writables are always serialized to bytes and
 cannot be used in comparisons (e.g., sorts and secondary sorts) according to their underlying types.
PCollection with each element
 equally likely to be included in the sample.
PTables.
ListenableFuture to allow clients to control
 job execution.
PipelineCallable instances.
PCollection with an independent probability in order to sample some
 fraction of the overall data set, or by using reservoir sampling in order to pull a uniform
 or weighted sample of fixed size from a PCollection of an unknown size.PCollection with the given probability.
PCollection using a given seed.
PTable<K, V> analogue of the sample function.
PTable<K, V> analogue of the sample function, with the seed argument
 exposed for testing purposes.
equals method for the input objects.
PCollection
 will cause it to change in side.
DoubleFlatMapFunction.DoubleFunction.PTable<K, Pair<V1, V2>> collection.SourceTarget<T> instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T> instance from the SequenceFile(s) at the given Path
 from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T> instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T> instance from the SequenceFile(s) at the given Path
 from the value field of each key-value pair in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
 from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
 from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
 from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
 from the key-value pairs in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given Path
 from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given Paths
 from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given Path
 from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given Paths
 from the value field of each key-value pair in the SequenceFile(s).
TableSource<K, V> instance for the SequenceFile(s) at the given path name.
TableSource<K, V> instance for the SequenceFile(s) at the given Path.
TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
TableSource<K, V> instance for the SequenceFile(s) at the given path name.
TableSource<K, V> instance for the SequenceFile(s) at the given Path.
TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
Target at the given path name that writes data to
 SequenceFiles.
Target at the given Path that writes data to
 SequenceFiles.
PCollection as a dependency to the given
 PipelineCallable and registers it with the Pipeline associated with this
 instance.
PipelineCallable on the client after the Targets
 that the PipelineCallable depends on (if any) have been created by other pipeline
 processing steps.
FileNamingScheme that uses an incrementing sequence number in
 order to generate unique file names.Supplier interface that indicates that an instance
 will also implement Serializable, which makes this object suitable for use
 with Crunch's DoFns when we need to construct an instance of a non-serializable
 type for use in processing.PCollection instances.PType that
 relies on this instance.
Configuration to use with this pipeline.
TaskInputOutputContext to this
 DoFn instance.
ClassLoader that will be used for loading Avro org.apache.avro.specific.SpecificRecord
 and reflection implementation classes.
FlatMapFunction.FlatMapFunction2.Function.Function2.PCollection is balanced across reducers
 and output files.PCollection<T> that has the same contents as its input argument but will
 be written to a fixed number of output files.
Scanner that is returned by the constructed
 TokenizerFactory.
PCollection instances.PCollection using the natural ordering of its elements in ascending order.
PCollection using the natural order of its elements with the given Order.
PCollection using the natural ordering of its elements in
 the order specified using the given number of reducers.
PTable using the natural ordering of its keys in ascending order.
PTable using the natural ordering of its keys with the given Order.
PTable using the natural ordering of its keys in the
 order specified with a client-specified number of reducers.
sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING))
  Column numbering is 1-based.PTable instance and then apply a
 DoFn to the resulting sorted data to yield an output PCollection<T>.
PTable instance and then apply a
 DoFn to the resulting sorted data to yield an output PCollection<T>, using
 the given number of reducers.
PTable instance and then apply a
 DoFn to the resulting sorted data to yield an output PTable<U, V>.
PTable instance and then apply a
 DoFn to the resulting sorted data to yield an output PTable<U, V>, using
 the given number of reducers.
DoFns that are used by Crunch's Sort library.GenericRecord instance.Tuple instance.Tuple instance.PCollection of Pairs using the specified column
 ordering.
PCollection of Tuple4s using the specified column
 ordering.
PCollection of Tuple3s using the specified column
 ordering.
PCollection of tuples using the specified column ordering.
PCollection of TupleNs using the specified column
 ordering and a client-specified number of reducers.
Source represents an input data set that is an input to one or more
 MapReduce jobs.Source and the
 Target interfaces.PairFlatMapFunction.PairFunction.Specific types.
PCollection of any Pair of objects into a Pair of
 PCollection}, to allow for the output of a DoFn to be handled using
 separate channels.
PCollection of any Pair of objects into a Pair of
 PCollection}, to allow for the output of a DoFn to be handled using
 separate channels.
BigInteger values.
double values.
float values.
int values.
long values.
Pair type.Source implementations that return a PTable.TableSource and the
 Target interfaces.Target represents the output destination of a Crunch PCollection
 in the context of a Crunch job.TemporaryPath.
SourceTarget<String> instance for the text file(s) at the given path name.
SourceTarget<String> instance for the text file(s) at the given Path.
SourceTarget<T> instance for the text file(s) at the given path name using
 the provided PType<T> to convert the input text.
SourceTarget<T> instance for the text file(s) at the given Path using
 the provided PType<T> to convert the input text.
Source<String> instance for the text file(s) at the given path name.
Source<String> instance for the text file(s) at the given Path.
Source<String> instance for the text file(s) at the given Paths.
Source<T> instance for the text file(s) at the given path name using
 the provided PType<T> to convert the input text.
Source<T> instance for the text file(s) at the given Path using
 the provided PType<T> to convert the input text.
Source<T> instance for the text file(s) at the given Paths using
 the provided PType<T> to convert the input text.
Target at the given path name that writes data to
 text files.
Target at the given Path that writes data to
 text files.
Target types.CombineFn adapter around the given aggregator.
Scanner instance and provides support for returning only a subset
 of the fields returned by the underlying Scanner.Tokenizer instance.
Tokenizer instances for input strings that use a fixed
 set of delimiters, skip patterns, locales, and sets of indices to keep or drop.TokenizerFactory instances using the Builder pattern.Partitioner instance that can work with either Avro or Writable-formatted
 keys.Tuple3.
PCollections.Tuples.Tuples.Tuple.
Tuple instance for an arbitrary number of values.Tuple interface.Tuple.PCollection instance that acts as the union of this
 PCollection and the given PCollection.
PCollection instance that acts as the union of this
 PCollection and the input PCollections.
PTable instance that acts as the union of this
 PTable and the other PTables.
PTable instance that acts as the union of this
 PTable and the input PTables.
equals method for
 the input objects.
UUID type.
PTable<K, V> as a PCollection<V>.
PCollection made up of the values in this PTable.
PCollection, where the second term in
 the input Pair is a numerical weight.
AvroMode instance which will utilize the factory instance
 for creating Avro readers and writers.
WritableTypeFamily for convenient static importing.Writable-based implementation of the
 PTypeFamily interface.PCollection to the given Target,
 using the storage format specified by the target.
PCollection to the given Target,
 using the given Target.WriteMode to handle existing
 targets.
PCollection to the given Target,
 using the storage format specified by the target and the given
 WriteMode for cases where the referenced Target
 already exists.
PTable to the given Target.
PTable to the given Target, using the
 given Target.WriteMode to handle existing targets.
out.
Tuple with a constructor that
 has the given extractor types that uses the given TokenizerFactory
 for parsing the sub-fields.
TokenizerFactory
 for parsing the sub-fields.
TokenizerFactory
 for parsing the sub-fields.
TokenizerFactory
 for parsing the sub-fields.
TokenizerFactory
 for parsing the sub-fields.
| 
 | |||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||