|
|||||||||
PREV NEXT | FRAMES NO FRAMES |
Extractor
instances that delegates the parsing of fields to other
Extractor
instances, primarily used for constructing composite records that implement
the Tuple
interface.Extractor
instances that construct a single
object from a block of text stored in a String
, with support for error handling
and reporting.Target
instance is compatible with the
given PType
.
PCollection
instances.PCollection
that contains the result of aggregating all values in this instance.
Aggregator
s.Converter
instance
before (for outputs) or after (for inputs) using the associated PType#getInputMapFn
and PType#getOutputMapFn calls.
PObject
encapsulating a Map
made up of the keys and values in this
PTable
.
PObject
encapsulating a Map
made up of the keys and values in this
PTable
.
PCollection<Pair<K, V>>
to a PTable<K, V>
.
SourceTarget
type that corresponds to this Target
for the given PType
, if possible.
SourceTarget
types, which may be treated as both a Source
and a Target
.SourceTarget<T>
instance from the Avro file(s) at the given path name.
SourceTarget<T>
instance from the Avro file(s) at the given Path
.
SourceTarget<GenericData.Record>
by reading the schema of the Avro file
at the given path.
SourceTarget<GenericData.Record>
by reading the schema of the Avro file
at the given path.
SourceTarget<GenericData.Record>
by reading the schema of the Avro file
at the given path using the FileSystem
information contained in the given
Configuration
instance.
SourceTarget<T>
instance from the Avro file(s) at the given path name.
SourceTarget<T>
instance from the Avro file(s) at the given Path
.
Source<T>
instance from the Avro file(s) at the given path name.
Source<T>
instance from the Avro file(s) at the given Path
.
Source<T>
instance from the Avro file(s) at the given Path
s.
Source<T>
instance from the Avro file(s) at the given path name.
Source<T>
instance from the Avro file(s) at the given Path
.
Source<T>
instance from the Avro file(s) at the given Path
s.
Source<GenericData.Record>
by reading the schema of the Avro file
at the given path.
Source<GenericData.Record>
by reading the schema of the Avro file
at the given path.
Source<GenericData.Record>
by reading the schema of the Avro file
at the given paths.
Source<GenericData.Record>
by reading the schema of the Avro file
at the given path using the FileSystem
information contained in the given
Configuration
instance.
Source<GenericData.Record>
by reading the schema of the Avro file
at the given paths using the FileSystem
information contained in the given
Configuration
instance.
Target
at the given path name that writes data to
Avro files.
Target
at the given Path
that writes data to
Avro files.
InputFormat
for Avro data files.OutputFormat
for Avro data files.FileOutputFormat
that takes in a Utf8
and an Avro record and writes the Avro records to
a sub-directory of the output path whose name is equal to the string-form of the Utf8
.AvroTypeFamily
for convenient static importing.InputFormat
for text files.BigInteger
type.
TokenizerFactory
with settings determined by this
Builder
instance.
CachingOptions.Builder
instance to use for specifying the caching options for a particular
PCollection<T>
.
TokenizerFactory.Builder
instance.
PTable
.
PTable
.
CachingOptions
.
CachingOptions
.
PCollection<T>
is cached for subsequent processing.CachingOptions
for a PCollection
.PTable
or PCollection
instances.Pair
instances emitted by DoFn
into
separate PCollection
instances.DoFn
is
associated with.
FilterFn
is
associated with.
DoFn
is
associated with.
DoFn
is
associated with.
running
the pipeline.
PTable
arguments.
PTable
arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.)
PTable
arguments.
PTable
arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.)
PTable
arguments.
PTable
arguments with a user-specified degree of parallelism (a.k.a, number of
reducers.)
PTable
arguments.
PTable
arguments with a user-specified degree of parallelism
(a.k.a, number of reducers.) The largest table should come last in the ordering.
DoFn
implementation that converts an Iterable
of
values into a single value.CombineFn
.
CombineFn
instances.
Aggregator
.
Aggregator
instances.
comm
utility.
Configuration
object associated with the
Job
that includes these options.
Configuration
instance(s) that are used to
read and write this SourceTarget<T>
.
Configuration
object.
bundle
with mode specific settings for the specific FormatBundle
.
conf
with mode specific settings.
AvroMode.configure(org.apache.hadoop.conf.Configuration)
conf
with mode specific settings for use during the shuffle phase.
DoFn
, or takes the output of a DoFn
and write it to the
output key/values.File
.
Path
.
PTable
that contains the unique elements of this collection mapped to a count
of their occurrences.
PTable
that contains the unique elements of this collection mapped to a count
of their occurrences.
PTable
instance that contains the counts of each unique
element of this PCollection.
Scanner
instance that wraps the input string and uses the delimiter,
skip, and locale settings for this TokenizerFactory
instance.
MapsideJoinStrategy
instance that will load its left-side table into memory,
and will materialize the contents of the left-side table to disk before running the in-memory join.
MapsideJoinStrategy
instance that will load its left-side table into memory.
PType<S>
that respects the given column
orderings.
PTable
s (using the same
strategy as Pig's CROSS operator).
PTable
s (using the same
strategy as Pig's CROSS operator).
PCollection
s (using the
same strategy as Pig's CROSS operator).
PCollection
s (using the
same strategy as Pig's CROSS operator).
InputFormat
instances within a single
Crunch MapReduce job.CrunchInputs
for handling multiple OutputFormat
instances
writing to multiple files within a single MapReduce job.RuntimeException
implementation that includes some additional options
for the Crunch execution engine to track reporting status.Tool
interface that creates a Pipeline
instance and provides methods for working with the Pipeline from inside of
the Tool's run method.CachingOptions
with the default caching settings.
ReadableData<T>
interface by delegating to an ReadableData<S>
instance
and passing its contents through a DoFn<S, T>
.TokenizerFactory
instances constructed by
this instance.
PCollection
.PCollection
that contains the unique elements of a
given input PCollection
.
PTable<K, V>
analogue of the distinct
function.
distinct
operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption.
PTable<K, V>
analogue of the distinct
function.
Iterator<T>
that combines a delegate Iterator<S>
and a DoFn<S, T>
, generating
data by passing the contents of the iterator through the function.run
.
DoFn
.Level
.
Appender
at the specified Level
.
Enum
type.
extract
on this instance
threw an exception that was handled.
MapFn
into a key-value pair that is
used to convert from a PCollection<V>
to a PTable<K, V>
.Scanner
object.Extractor
types.Extractor
encountered when parsing
input data.PCollection
.
PCollection
.
PTable
.
PTable
.
DoFn
for the common case of filtering the members of a
PCollection
based on a boolean condition.FilterFn
implementations.Counter
class changed incompatibly between Hadoop 1 and 2
(from a class to an interface) so user programs should avoid this method and use
PipelineResult.StageResult.getCounterValue(Enum)
and/or PipelineResult.StageResult.getCounterDisplayName(Enum)
.
n
values (or fewer if there are fewer values than n
).
InputFormat
or OutputFormat
and any extra
configuration information that format class needs to run.TableSource<K, V>
for reading data from files that have custom
FileInputFormat<K, V>
implementations not covered by the provided TableSource
and Source
factory methods.
TableSource<K, V>
for reading data from files that have custom
FileInputFormat<K, V>
implementations not covered by the provided TableSource
and Source
factory methods.
TableSource<K, V>
for reading data from files that have custom
FileInputFormat<K, V>
implementations not covered by the provided TableSource
and Source
factory methods.
TableSource<K, V>
for reading data from files that have custom
FileInputFormat
implementations not covered by the provided TableSource
and Source
factory methods.
TableSource<K, V>
for reading data from files that have custom
FileInputFormat
implementations not covered by the provided TableSource
and Source
factory methods.
TableSource<K, V>
for reading data from files that have custom
FileInputFormat
implementations not covered by the provided TableSource
and Source
factory methods.
Target
at the given path name that writes data to
a custom FileOutputFormat
.
Target
at the given Path
that writes data to
a custom FileOutputFormat
.
Source
types.AvroMode.AVRO_MODE_PROPERTY
property in the conf
.
AvroMode.AVRO_SHUFFLE_MODE_PROPERTY
property in the conf
.
AvroMode
based upon the specified type
.
PTable
s.
Generic
types.
Configuration
instance associated with this pipeline.
Converter
used for mapping the inputs from this instance
into PCollection
or PTable
values.
Converter
to use for mapping from the output PCollection
into the output values expected by this instance.
Counter
class changed incompatibly between Hadoop 1 and 2
(from a class to an interface) so user programs should avoid this method and use
PipelineResult.StageResult.getCounterNames()
.
GenericData
instance based on the mode type.
SourceTarget
that is able to read/write data using the serialization format
specified by this PType
.
TokenizerFactory
that uses whitespace as a delimiter and does
not skip any input fields.
Extractor
in case of an
error.
Pair
.
PTypeFamily
that this PType
belongs to.
Pair
or TupleN
.
File
below the temporary directory.
PGroupedTable
value.
PGroupedTableType
containing serialization information for
this PGroupedTable
.
PType
of the key.
Source
was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.)
PCollection
.
Path
below the temporary directory.
Pipeline
associated with this PCollection.
PTableType
of this PTable
.
PType
associated with this data type for the
given PTypeFamily
.
PType
of this PCollection
.
DatumReader
based on the schema
.
null
.
Path
.
PCollection
in
bytes.
Source
.
Extractor
instance
encountered while parsing input data.
TupleFactory
for a given Tuple implementation.
PType
for this source.
PType
.
PTypeFamily
of this PCollection
.
PObject
.
PType
of the value.
DatumWriter
based on the schema
.
GroupingOptions
to control how the grouping is executed.
groupByKey
operation in order to
exercise finer control over how the partitioning, grouping, and sorting of
keys is performed.GroupingOptions
instances.WriteMode
to this Target
instance.
Scanner
has any tokens remaining.
Extractor
during the
start of a map or reduce task.
PTable
s.
Configuration
instance that is used to read
this Source<T></T>
.
PTable
instances based on a common
lastKey.PTable
instances using a user-specified JoinFn
.
PTable
s.
DoFn
for performing joins.ObjectMapper
.
PTable<K, V>
as a PCollection<K>
.
PCollection
made up of the keys in this PTable.
n
values (or fewer if there are fewer values than n
).
PTable
s.
PCollection
.
Locale
to use with the TokenizerFactory
returned by
this Builder
instance.
DoFn
for the common case of emitting exactly one value for each
input record.PTable<K1, V>
to a PTable<K2, V>
using the given MapFn<K1, K2>
on
the keys of the PTable
.
PTable<K1, V>
to a PTable<K2, V>
using the given MapFn<K1, K2>
on
the keys of the PTable
.
PTable
that has the same values as this instance, but
uses the given function to map the keys.
PTable
that has the same values as this instance, but
uses the given function to map the keys.
PTable
s.MapsideJoinStrategy.create()
factory method instead
MapsideJoinStrategy.create(boolean)
factory method instead
PTable<K, U>
to a PTable<K, V>
using the given MapFn<U, V>
on
the values of the PTable
.
PTable<K, U>
to a PTable<K, V>
using the given MapFn<U, V>
on
the values of the PTable
.
mapValues
function for PGroupedTable<K, U>
collections.
mapValues
function for PGroupedTable<K, U>
collections.
Iterable<V>
elements of each record to a new type.
Iterable<V>
elements of each record to a new type.
PTable
that has the same keys as this instance, but
uses the given function to map the values.
PTable
that has the same keys as this instance, but
uses the given function to map the values.
PObject
of the maximum element of this instance.
BigInteger
values.
n
largest BigInteger
values (or fewer if there are fewer
values than n
).
double
values.
n
largest double
values (or fewer if there are fewer
values than n
).
float
values.
n
largest float
values (or fewer if there are fewer
values than n
).
int
values.
n
largest int
values (or fewer if there are fewer
values than n
).
long
values.
n
largest long
values (or fewer if there are fewer
values than n
).
n
largest values (or fewer if there are fewer
values than n
).
PObject
of the minimum element of this instance.
BigInteger
values.
n
smallest BigInteger
values (or fewer if there are fewer
values than n
).
double
values.
n
smallest double
values (or fewer if there are fewer
values than n
).
float
values.
n
smallest float
values (or fewer if there are fewer
values than n
).
int
values.
n
smallest int
values (or fewer if there are fewer
values than n
).
long
values.
n
smallest long
values (or fewer if there are fewer
values than n
).
n
smallest values (or fewer if there are fewer
values than n
).
Tokenizer
and return the next String from the Scanner
.
Tokenizer
and return the next Boolean from the Scanner
.
Tokenizer
and return the next Double from the Scanner
.
Tokenizer
and return the next Float from the Scanner
.
Tokenizer
and return the next Integer from the Scanner
.
Tokenizer
and return the next Long from the Scanner
.
DeepCopier
that does nothing, and just returns the input value without copying anything.Configuration
instance that is used to write
this Target
.
AvroMode.withFactory(ReaderWriterFactory)
instead.
Tuple
s.Pair
.
PCollection
and
returns a new PCollection
that is the output of this processing.
PCollection
and
returns a new PCollection
that is the output of this processing.
PCollection
and
returns a new PCollection
that is the output of this processing.
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.
parallelDo
instance, but returns a
PTable
instance instead of a PCollection
.
parallelDo
operation
applied to a PCollection
.PCollection<String>
into PCollection
's of strongly-typed
tuples.PCollection<String>
and returns a PCollection<T>
using
the given Extractor<T>
.
PCollection<String>
and returns a PCollection<T>
using
the given Extractor<T>
that uses the given PTypeFamily
.
PCollection<String>
and returns a PTable<K, V>
using
the given Extractor<Pair<K, V>>
.
PCollection<String>
and returns a PTable<K, V>
using
the given Extractor<Pair<K, V>>
that uses the given PTypeFamily
.
PTable
, which corresponds to the output of
the shuffle phase of a MapReduce job.PType
instance for PGroupedTable
instances.run
or done
on the
Pipeline interface that includes details and statistics about the component
stages of the data pipeline.PObject
represents a singleton object value that results from a distributed
computation.PCollection
.
SerializableSupplier
to provide
an ExtensionRegistry
to use in reading the given protobuf.
PCollection
that represents an immutable,
distributed multi-map of keys and values.PType
specifically for PTable
objects.PType
defines a mapping between a data type that is used in a Crunch pipeline and a
serialization and storage format that is used to read/write data from/to HDFS.PType
instances that have the same
serialization/storage backing format.PType
s from different
PTypeFamily
implementations.Tuple4
.
Iterable
that contains the contents of this source.
Source
into a PCollection
that is
available to jobs run using this Pipeline
instance.
TableSource
instances that map to
PTable
s.
Source
interface that indicates that a
Source
instance may be read as a series of records by the client
code.SourceTarget
instance can be read
into the local client.Reflect
types.
WritableComparable
class so that it can be used for comparing the fields inside of
tuple types (e.g., pairs
, trips
, tupleN
, etc.) for use in sorts and
secondary sorts.
WritableComparable
class with a given integer code to use for serializing
and deserializing instances of this class that are defined inside of tuple types (e.g., pairs
,
trips
, tupleN
, etc.) Unregistered Writables are always serialized to bytes and
cannot be used in comparisons (e.g., sorts and secondary sorts) according to their underlying types.
PCollection
with each element
equally likely to be included in the sample.
PTable
s.
ListenableFuture
to allow clients to control
job execution.
PCollection
with an independent probability in order to sample some
fraction of the overall data set, or by using reservoir sampling in order to pull a uniform
or weighted sample of fixed size from a PCollection
of an unknown size.PCollection
with the given probability.
PCollection
using a given seed.
PTable<K, V>
analogue of the sample
function.
PTable<K, V>
analogue of the sample
function, with the seed argument
exposed for testing purposes.
equals
method for the input objects.
PCollection
will cause it to change in side.
PTable<K, Pair<V1, V2>>
collection.SourceTarget<T>
instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T>
instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T>
instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
SourceTarget<T>
instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
TableSourceTarget<K, V>
instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V>
instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V>
instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s).
TableSourceTarget<K, V>
instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s).
Source<T>
instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
Source<T>
instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
Source<T>
instance from the SequenceFile(s) at the given Path
s
from the value field of each key-value pair in the SequenceFile(s).
Source<T>
instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
Source<T>
instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
Source<T>
instance from the SequenceFile(s) at the given Path
s
from the value field of each key-value pair in the SequenceFile(s).
TableSource<K, V>
instance for the SequenceFile(s) at the given path name.
TableSource<K, V>
instance for the SequenceFile(s) at the given Path
.
TableSource<K, V>
instance for the SequenceFile(s) at the given Path
s.
TableSource<K, V>
instance for the SequenceFile(s) at the given path name.
TableSource<K, V>
instance for the SequenceFile(s) at the given Path
.
TableSource<K, V>
instance for the SequenceFile(s) at the given Path
s.
Target
at the given path name that writes data to
SequenceFiles.
Target
at the given Path
that writes data to
SequenceFiles.
FileNamingScheme
that uses an incrementing sequence number in
order to generate unique file names.Supplier
interface that indicates that an instance
will also implement Serializable
, which makes this object suitable for use
with Crunch's DoFns when we need to construct an instance of a non-serializable
type for use in processing.PCollection
instances.PType
that
relies on this instance.
Configuration
to use with this pipeline.
TaskInputOutputContext
to this
DoFn
instance.
PCollection
is balanced across reducers
and output files.PCollection<T>
that has the same contents as its input argument but will
be written to a fixed number of output files.
Scanner
that is returned by the constructed
TokenizerFactory
.
PCollection
instances.PCollection
using the natural ordering of its elements in ascending order.
PCollection
using the natural order of its elements with the given Order
.
PCollection
using the natural ordering of its elements in
the order specified using the given number of reducers.
PTable
using the natural ordering of its keys in ascending order.
PTable
using the natural ordering of its keys with the given Order
.
PTable
using the natural ordering of its keys in the
order specified with a client-specified number of reducers.
sortPairs(coll, by(2, ASCENDING), by(1, DESCENDING))
Column numbering is 1-based.PTable
instance and then apply a
DoFn
to the resulting sorted data to yield an output PCollection<T>
.
PTable
instance and then apply a
DoFn
to the resulting sorted data to yield an output PCollection<T>
, using
the given number of reducers.
PTable
instance and then apply a
DoFn
to the resulting sorted data to yield an output PTable<U, V>
.
PTable
instance and then apply a
DoFn
to the resulting sorted data to yield an output PTable<U, V>
, using
the given number of reducers.
DoFn
s that are used by Crunch's Sort
library.GenericRecord
instance.Tuple
instance.Tuple
instance.PCollection
of Pair
s using the specified column
ordering.
PCollection
of Tuple4
s using the specified column
ordering.
PCollection
of Tuple3
s using the specified column
ordering.
PCollection
of tuples using the specified column ordering.
PCollection
of TupleN
s using the specified column
ordering and a client-specified number of reducers.
Source
represents an input data set that is an input to one or more
MapReduce jobs.Source
and the
Target
interfaces.Specific
types.
PCollection
of any Pair
of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels.
PCollection
of any Pair
of objects into a Pair of
PCollection}, to allow for the output of a DoFn to be handled using
separate channels.
BigInteger
values.
double
values.
float
values.
int
values.
long
values.
Source
implementations that return a PTable
.TableSource
and the
Target
interfaces.Target
represents the output destination of a Crunch PCollection
in the context of a Crunch job.TemporaryPath
.
SourceTarget<String>
instance for the text file(s) at the given path name.
SourceTarget<String>
instance for the text file(s) at the given Path
.
SourceTarget<T>
instance for the text file(s) at the given path name using
the provided PType<T>
to convert the input text.
SourceTarget<T>
instance for the text file(s) at the given Path
using
the provided PType<T>
to convert the input text.
Source<String>
instance for the text file(s) at the given path name.
Source<String>
instance for the text file(s) at the given Path
.
Source<String>
instance for the text file(s) at the given Path
s.
Source<T>
instance for the text file(s) at the given path name using
the provided PType<T>
to convert the input text.
Source<T>
instance for the text file(s) at the given Path
using
the provided PType<T>
to convert the input text.
Source<T>
instance for the text file(s) at the given Path
s using
the provided PType<T>
to convert the input text.
Target
at the given path name that writes data to
text files.
Target
at the given Path
that writes data to
text files.
Target
types.CombineFn
adapter around the given aggregator.
Scanner
instance and provides support for returning only a subset
of the fields returned by the underlying Scanner
.Tokenizer
instance.
Tokenizer
instances for input strings that use a fixed
set of delimiters, skip patterns, locales, and sets of indices to keep or drop.TokenizerFactory
instances using the Builder pattern.Partitioner
instance that can work with either Avro or Writable-formatted
keys.Tuple3
.
PCollection
s.Tuple
s.Tuple
s.Tuple
.
Tuple
instance for an arbitrary number of values.Tuple
interface.Tuple
.PCollection
instance that acts as the union of this
PCollection
and the given PCollection
.
PCollection
instance that acts as the union of this
PCollection
and the input PCollection
s.
PTable
instance that acts as the union of this
PTable
and the other PTable
s.
PTable
instance that acts as the union of this
PTable
and the input PTable
s.
equals
method for
the input objects.
UUID
type.
PTable<K, V>
as a PCollection<V>
.
PCollection
made up of the values in this PTable.
PCollection
, where the second term in
the input Pair
is a numerical weight.
AvroMode
instance which will utilize the factory
instance
for creating Avro readers and writers.
WritableTypeFamily
for convenient static importing.Writable
-based implementation of the
PTypeFamily
interface.PCollection
to the given Target
,
using the storage format specified by the target.
PCollection
to the given Target
,
using the given Target.WriteMode
to handle existing
targets.
PCollection
to the given Target
,
using the storage format specified by the target and the given
WriteMode
for cases where the referenced Target
already exists.
PTable
to the given Target
.
PTable
to the given Target
, using the
given Target.WriteMode
to handle existing targets.
out
.
Tuple
with a constructor that
has the given extractor types that uses the given TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
TokenizerFactory
for parsing the sub-fields.
|
|||||||||
PREV NEXT | FRAMES NO FRAMES |