public interface PType<T> extends Serializable
PType
defines a mapping between a data type that is used in a Crunch pipeline and a
serialization and storage format that is used to read/write data from/to HDFS. Every
PCollection
has an associated PType
that tells Crunch how to read/write data from
that PCollection
.Modifier and Type | Method and Description |
---|---|
ReadableSource<T> |
createSourceTarget(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path path,
Iterable<T> contents,
int parallelism)
Returns a
ReadableSource that contains the data in the given Iterable . |
Converter |
getConverter() |
ReadableSourceTarget<T> |
getDefaultFileSource(org.apache.hadoop.fs.Path path)
Returns a
SourceTarget that is able to read/write data using the serialization format
specified by this PType . |
T |
getDetachedValue(T value)
Returns a copy of a value (or the value itself) that can safely be retained.
|
PTypeFamily |
getFamily()
Returns the
PTypeFamily that this PType belongs to. |
MapFn<Object,T> |
getInputMapFn() |
MapFn<T,Object> |
getOutputMapFn() |
List<PType> |
getSubTypes()
Returns the sub-types that make up this PType if it is a composite instance, such as a tuple.
|
Class<T> |
getTypeClass()
Returns the Java type represented by this
PType . |
void |
initialize(org.apache.hadoop.conf.Configuration conf)
Initialize this PType for use within a DoFn.
|
PTypeFamily getFamily()
PTypeFamily
that this PType
belongs to.Converter getConverter()
void initialize(org.apache.hadoop.conf.Configuration conf)
getDetachedValue(Object)
.conf
- Configuration objectgetDetachedValue(Object)
T getDetachedValue(T value)
This is useful when iterable values being processed in a DoFn (via a reducer) need to be held
on to for more than the scope of a single iteration, as a reducer (and therefore also a DoFn
that has an Iterable as input) re-use deserialized values. More information on object reuse is
available in the DoFn
class documentation.
value
- The value to be deep-copiedReadableSourceTarget<T> getDefaultFileSource(org.apache.hadoop.fs.Path path)
SourceTarget
that is able to read/write data using the serialization format
specified by this PType
.ReadableSource<T> createSourceTarget(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path path, Iterable<T> contents, int parallelism) throws IOException
ReadableSource
that contains the data in the given Iterable
.conf
- The Configuration to usepath
- The path to write the data tocontents
- The contents to writeparallelism
- The desired parallelismIOException
Copyright © 2016 The Apache Software Foundation. All rights reserved.