|
|||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||
Extractor instances that delegates the parsing of fields to other
Extractor instances, primarily used for constructing composite records that implement
the Tuple interface.Extractor instances that construct a single
object from a block of text stored in a String, with support for error handling
and reporting.Target instance is compatible with the
given PType.
PCollection instances.Aggregators.toCombineFn(Aggregator)
PGroupedTable.combineValues(Aggregator) which doesn't require a factory.
Aggregators.FilterFns.and(FilterFn...)
PObject encapsulating a Map made up of the keys and values in this
PTable.
PCollection<Pair<K, V>> to a PTable<K, V>.
SourceTarget type that corresponds to this Target
for the given PType, if possible.
SourceTarget types, which may be treated as both a Source
and a Target.SourceTarget<T> instance from the Avro file(s) at the given path name.
SourceTarget<T> instance from the Avro file(s) at the given Path.
SourceTarget<T> instance from the Avro file(s) at the given path name.
SourceTarget<T> instance from the Avro file(s) at the given Path.
Source<T> instance from the Avro file(s) at the given path name.
Source<T> instance from the Avro file(s) at the given Path.
Source<T> instance from the Avro file(s) at the given path name.
Source<T> instance from the Avro file(s) at the given Path.
Target at the given path name that writes data to
Avro files.
Target at the given Path that writes data to
Avro files.
InputFormat for Avro data files.OutputFormat for Avro data files.AvroTypeFamily for convenient static importing.InputFormat for text files.TokenizerFactory with settings determined by this
Builder instance.
TokenizerFactory.Builder instance.
PTable.
PTable.
PTable or PCollection
instances.DoFn is
associated with.
FilterFn is
associated with.
DoFn is
associated with.
DoFn is
associated with.
PTable arguments.
DoFn implementation that converts an Iterable of
values into a single value.AggregatorAggregators.toCombineFn(org.apache.crunch.Aggregator) adapterPGroupedTable.combineValues(Aggregator) which doesn't require a factory.Aggregators.FIRST_N(int)Aggregators.LAST_N(int)Aggregators.MAX_BIGINTS()Aggregators.MAX_DOUBLES()Aggregators.MAX_FLOATS()Aggregators.MAX_INTS()Aggregators.MAX_LONGS()Aggregators.MAX_N(int, Class)Aggregators.MIN_BIGINTS()Aggregators.MIN_DOUBLES()Aggregators.MIN_FLOATS()Aggregators.MIN_INTS()Aggregators.MIN_LONGS()Aggregators.MIN_N(int, Class)Aggregators.pairAggregator(Aggregator, Aggregator)Aggregators.quadAggregator(Aggregator, Aggregator, Aggregator, Aggregator)Aggregators.SimpleAggregatorAggregators.STRING_CONCAT(String, boolean, long, long)Aggregators.SUM_BIGINTS()Aggregators.SUM_DOUBLES()Aggregators.SUM_FLOATS()Aggregators.SUM_INTS()Aggregators.SUM_LONGS()Aggregators.tripAggregator(Aggregator, Aggregator, Aggregator)Aggregators.tupleAggregator(Aggregator...)CombineFn.
Aggregator.
comm utility.
DoFn, or takes the output of a DoFn and write it to the
output key/values.File.
Path.
PTable that contains the unique elements of this collection mapped to a count
of their occurrences.
PTable instance that contains the counts of each unique
element of this PCollection.
Scanner instance that wraps the input string and uses the delimiter,
skip, and locale settings for this TokenizerFactory instance.
PType<S> that respects the given column
orderings.
PTables (using the same
strategy as Pig's CROSS operator).
PTables (using the same
strategy as Pig's CROSS operator).
PCollections (using the
same strategy as Pig's CROSS operator).
PCollections (using the
same strategy as Pig's CROSS operator).
InputFormat instances within a single
Crunch MapReduce job.CrunchInputs for handling multiple OutputFormat instances
writing to multiple files within a single MapReduce job.RuntimeException implementation that includes some additional options
for the Crunch execution engine to track reporting status.Tool interface that creates a Pipeline
instance and provides methods for working with the Pipeline from inside of
the Tool's run method.TokenizerFactory instances constructed by
this instance.
PCollection.PCollection that contains the unique elements of a
given input PCollection.
PTable<K, V> analogue of the distinct function.
distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption.
PTable<K, V> analogue of the distinct function.
run.
DoFn.Level.
Appender at the specified Level.
extract on this instance
threw an exception that was handled.
MapFn into a key-value pair that is
used to convert from a PCollection<V> to a PTable<K, V>.Scanner object.Extractor types.Extractor encountered when parsing
input data.PCollection.
PCollection.
PTable.
PTable.
DoFn for the common case of filtering the members of a
PCollection based on a boolean condition.FilterFns.and(FilterFn...)FilterFns.not(FilterFn)FilterFns.or(FilterFn...)FilterFn implementations.Aggregators.FIRST_N(int)
n values (or fewer if there are fewer values than n).
InputFormat or OutputFormat and any extra
configuration information that format class needs to run.TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods.
TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods.
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods.
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods.
Target at the given path name that writes data to
a custom FileOutputFormat.
Target at the given Path that writes data to
a custom FileOutputFormat.
Source types.PTables.
Configuration instance associated with this pipeline.
SourceTarget that is able to read/write data using the serialization format
specified by this PType.
TokenizerFactory that uses whitespace as a delimiter and does
not skip any input fields.
Extractor in case of an
error.
Pair.
PTypeFamily that this PType belongs to.
Pair or TupleN.
File below the temporary directory.
PGroupedTable value.
PType of the key.
PCollection.
Path below the temporary directory.
Pipeline associated with this PCollection.
PTableType of this PTable.
PType associated with this data type for the
given PTypeFamily.
PType of this PCollection.
null.
Path.
PCollection in
bytes.
Source.
Extractor instance
encountered while parsing input data.
TupleFactory for a given Tuple implementation.
PType for this source.
PType.
PTypeFamily of this PCollection.
PObject.
PType of the value.
GroupingOptions to control how the grouping is executed.
groupByKey operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed.GroupingOptions instances.WriteMode to this Target instance.
Scanner has any tokens remaining.
Extractor during the
start of a map or reduce task.
PTables.
PTable instances based on a common
lastKey.PTables.
DoFn for performing joins.PTable<K, V> as a PCollection<K>.
PCollection made up of the keys in this PTable.
Aggregators.LAST_N(int)
n values (or fewer if there are fewer values than n).
PTables.
PCollection.
Locale to use with the TokenizerFactory returned by
this Builder instance.
DoFn for the common case of emitting exactly one value for each
input record.PTables.PObject of the maximum element of this instance.
Aggregators.MAX_BIGINTS()
Aggregators.MAX_BIGINTS()
Aggregators.MAX_BIGINTS(int)
BigInteger values.
n largest BigInteger values (or fewer if there are fewer
values than n).
Aggregators.MAX_DOUBLES()
Aggregators.MAX_DOUBLES()
Aggregators.MAX_DOUBLES(int)
double values.
n largest double values (or fewer if there are fewer
values than n).
Aggregators.MAX_FLOATS()
Aggregators.MAX_FLOATS()
Aggregators.MAX_FLOATS(int)
float values.
n largest float values (or fewer if there are fewer
values than n).
Aggregators.MAX_INTS()
Aggregators.MAX_INTS()
Aggregators.MAX_INTS(int)
int values.
n largest int values (or fewer if there are fewer
values than n).
Aggregators.MAX_LONGS()
Aggregators.MAX_LONGS()
Aggregators.MAX_LONGS(int)
long values.
n largest long values (or fewer if there are fewer
values than n).
n largest values (or fewer if there are fewer
values than n).
PObject of the minimum element of this instance.
Aggregators.MIN_BIGINTS()
Aggregators.MIN_BIGINTS()
Aggregators.MIN_BIGINTS(int)
BigInteger values.
n smallest BigInteger values (or fewer if there are fewer
values than n).
Aggregators.MIN_DOUBLES()
Aggregators.MIN_DOUBLES()
Aggregators.MIN_DOUBLES(int)
double values.
n smallest double values (or fewer if there are fewer
values than n).
Aggregators.MIN_FLOATS()
Aggregators.MIN_FLOATS()
Aggregators.MIN_FLOATS(int)
float values.
n smallest float values (or fewer if there are fewer
values than n).
Aggregators.MIN_INTS()
Aggregators.MIN_INTS()
Aggregators.MIN_INTS(int)
int values.
n smallest int values (or fewer if there are fewer
values than n).
Aggregators.MIN_LONGS()
Aggregators.MIN_LONGS()
Aggregators.MIN_LONGS(int)
long values.
n smallest long values (or fewer if there are fewer
values than n).
n smallest values (or fewer if there are fewer
values than n).
Tokenizer and return the next String from the Scanner.
Tokenizer and return the next Boolean from the Scanner.
Tokenizer and return the next Double from the Scanner.
Tokenizer and return the next Float from the Scanner.
Tokenizer and return the next Integer from the Scanner.
Tokenizer and return the next Long from the Scanner.
FilterFns.not(FilterFn)
FilterFns.or(FilterFn...)
Tuples.Aggregators.pairAggregator(Aggregator, Aggregator)
Pair.
PCollection and
returns a new PCollection that is the output of this processing.
PCollection and
returns a new PCollection that is the output of this processing.
PCollection and
returns a new PCollection that is the output of this processing.
parallelDo instance, but returns a
PTable instance instead of a PCollection.
parallelDo instance, but returns a
PTable instance instead of a PCollection.
parallelDo instance, but returns a
PTable instance instead of a PCollection.
parallelDo operation
applied to a PCollection.PCollection<String> into PCollection's of strongly-typed
tuples.PCollection<String> and returns a PCollection<T> using
the given Extractor<T>.
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily.
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>>.
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily.
PTable.PType instance for PGroupedTable instances.run or done on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline.PObject represents a singleton object value that results from a distributed
computation.PCollection.
PCollection that represents an immutable,
distributed multi-map of keys and values.PType specifically for PTable objects.PType defines a mapping between a data type that is used in a Crunch pipeline and a
serialization and storage format that is used to read/write data from/to HDFS.PType instances that have the same
serialization/storage backing format.PTypes from different
PTypeFamily implementations.Aggregators.quadAggregator(Aggregator, Aggregator, Aggregator, Aggregator)
Tuple4.
Iterable that contains the contents of this source.
Source into a PCollection that is
available to jobs run using this Pipeline instance.
TableSource instances that map to
PTables.
Source interface that indicates that a
Source instance may be read as a series of records by the client
code.SourceTarget instance can be read
into the local client.PCollection with each element
equally likely to be included in the sample.
PTables.
ListenableFuture to allow clients to control
job execution.
PCollection with an independent probability in order to sample some
fraction of the overall data set, or by using reservoir sampling in order to pull a uniform
or weighted sample of fixed size from a PCollection of an unknown size.PCollection with the given probability.
PCollection using a given seed.
PTable<K, V> analogue of the sample function.
PTable<K, V> analogue of the sample function, with the seed argument
exposed for testing purposes.
equals method for the input objects.
PCollection
will cause it to change in side.
PTable<K, Pair<V1, V2>> collection.SourceTarget<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
TableSource<K, V> instance for the SequenceFile(s) at the given path name.
TableSource<K, V> instance for the SequenceFile(s) at the given Path.
TableSource<K, V> instance for the SequenceFile(s) at the given path name.
TableSource<K, V> instance for the SequenceFile(s) at the given Path.
Target at the given path name that writes data to
SequenceFiles.
Target at the given Path that writes data to
SequenceFiles.
FileNamingScheme that uses an incrementing sequence number in
order to generate unique file names.PCollection instances.Configuration to use with this pipeline.
TaskInputOutputContext to this
DoFn instance.
Scanner that is returned by the constructed
TokenizerFactory.
PCollection instances.PCollection using the natural ordering of its elements in ascending order.
PCollection using the natural order of its elements with the given Order.
PCollection using the natural ordering of its elements in
the order specified using the given number of reducers.
PTable using the natural ordering of its keys in ascending order.
PTable using the natural ordering of its keys with the given Order.
PTable using the natural ordering of its keys in the
order specified with a client-specified number of reducers.
sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING))
Column numbering is 1-based.PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T>.
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PTable<U, V>.
DoFns that are used by Crunch's Sort library.GenericRecord instance.Tuple instance.Tuple instance.PCollection of Pairs using the specified column
ordering.
PCollection of Tuple4s using the specified column
ordering.
PCollection of Tuple3s using the specified column
ordering.
PCollection of tuples using the specified column ordering.
PCollection of TupleNs using the specified column
ordering and a client-specified number of reducers.
Source represents an input data set that is an input to one or more
MapReduce jobs.Source and the
Target interfaces.Aggregators.STRING_CONCAT(String, boolean)
Aggregators.STRING_CONCAT(String, boolean, long, long)
Aggregators.SUM_BIGINTS()
Aggregators.SUM_BIGINTS()
BigInteger values.
Aggregators.SUM_DOUBLES()
Aggregators.SUM_DOUBLES()
double values.
Aggregators.SUM_FLOATS()
Aggregators.SUM_FLOATS()
float values.
Aggregators.SUM_INTS()
Aggregators.SUM_INTS()
int values.
Aggregators.SUM_LONGS()
Aggregators.SUM_LONGS()
long values.
Source implementations that return a PTable.TableSource and the
Target interfaces.Target represents the output destination of a Crunch PCollection
in the context of a Crunch job.TemporaryPath.
SourceTarget<String> instance for the text file(s) at the given path name.
SourceTarget<String> instance for the text file(s) at the given Path.
SourceTarget<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text.
SourceTarget<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text.
Source<String> instance for the text file(s) at the given path name.
Source<String> instance for the text file(s) at the given Path.
Source<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text.
Source<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text.
Target at the given path name that writes data to
text files.
Target at the given Path that writes data to
text files.
Target types.CombineFn adapter around the given aggregator.
Scanner instance and provides support for returning only a subset
of the fields returned by the underlying Scanner.Tokenizer instance.
Tokenizer instances for input strings that use a fixed
set of delimiters, skip patterns, locales, and sets of indices to keep or drop.TokenizerFactory instances using the Builder pattern.Partitioner instance that can work with either Avro or Writable-formatted
keys.Aggregators.tripAggregator(Aggregator, Aggregator, Aggregator)
Tuple3.
PCollections.Tuples.Tuples.Aggregators.tupleAggregator(Aggregator...)
Tuple.
Tuple instance for an arbitrary number of values.Tuple interface.PCollection instance that acts as the union of this
PCollection and the given PCollection.
PCollection instance that acts as the union of this
PCollection and the input PCollections.
PTable instance that acts as the union of this
PTable and the other PTables.
PTable instance that acts as the union of this
PTable and the input PTables.
equals method for
the input objects.
PTable<K, V> as a PCollection<V>.
PCollection made up of the values in this PTable.
PCollection, where the second term in
the input Pair is a numerical weight.
WritableTypeFamily for convenient static importing.Writable-based implementation of the
PTypeFamily interface.PCollection to the given Target,
using the storage format specified by the target.
PCollection to the given Target,
using the given Target.WriteMode to handle existing
targets.
PCollection to the given Target,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists.
PTable to the given Target.
PTable to the given Target, using the
given Target.WriteMode to handle existing targets.
out.
Tuple with a constructor that
has the given extractor types that uses the given TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
|
|||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||