public interface PType<T> extends Serializable
PType
defines a mapping between a data type that is used in a Crunch pipeline and a
serialization and storage format that is used to read/write data from/to HDFS. Every
PCollection
has an associated PType
that tells Crunch how to read/write data from
that PCollection
.Modifier and Type | Method and Description |
---|---|
Converter |
getConverter() |
SourceTarget<T> |
getDefaultFileSource(org.apache.hadoop.fs.Path path)
Returns a
SourceTarget that is able to read/write data using the serialization format
specified by this PType . |
T |
getDetachedValue(T value)
Returns a copy of a value (or the value itself) that can safely be retained.
|
PTypeFamily |
getFamily()
Returns the
PTypeFamily that this PType belongs to. |
MapFn<Object,T> |
getInputMapFn() |
MapFn<T,Object> |
getOutputMapFn() |
List<PType> |
getSubTypes()
Returns the sub-types that make up this PType if it is a composite instance, such as a tuple.
|
Class<T> |
getTypeClass()
Returns the Java type represented by this
PType . |
void |
initialize(org.apache.hadoop.conf.Configuration conf)
Initialize this PType for use within a DoFn.
|
PTypeFamily getFamily()
PTypeFamily
that this PType
belongs to.Converter getConverter()
void initialize(org.apache.hadoop.conf.Configuration conf)
getDetachedValue(Object)
.conf
- Configuration objectgetDetachedValue(Object)
T getDetachedValue(T value)
This is useful when iterable values being processed in a DoFn (via a reducer) need to be held
on to for more than the scope of a single iteration, as a reducer (and therefore also a DoFn
that has an Iterable as input) re-use deserialized values. More information on object reuse is
available in the DoFn
class documentation.
value
- The value to be deep-copiedSourceTarget<T> getDefaultFileSource(org.apache.hadoop.fs.Path path)
SourceTarget
that is able to read/write data using the serialization format
specified by this PType
.Copyright © 2013 The Apache Software Foundation. All Rights Reserved.