This project has retired. For details please refer to its Attic page.
PType (Apache Crunch 0.10.0 API)

org.apache.crunch.types
Interface PType<T>

All Superinterfaces:
Serializable
All Known Subinterfaces:
PTableType<K,V>
All Known Implementing Classes:
AvroType, PGroupedTableType, WritableType

public interface PType<T>
extends Serializable

A PType defines a mapping between a data type that is used in a Crunch pipeline and a serialization and storage format that is used to read/write data from/to HDFS. Every PCollection has an associated PType that tells Crunch how to read/write data from that PCollection.


Method Summary
 Converter getConverter()
           
 ReadableSourceTarget<T> getDefaultFileSource(org.apache.hadoop.fs.Path path)
          Returns a SourceTarget that is able to read/write data using the serialization format specified by this PType.
 T getDetachedValue(T value)
          Returns a copy of a value (or the value itself) that can safely be retained.
 PTypeFamily getFamily()
          Returns the PTypeFamily that this PType belongs to.
 MapFn<Object,T> getInputMapFn()
           
 MapFn<T,Object> getOutputMapFn()
           
 List<PType> getSubTypes()
          Returns the sub-types that make up this PType if it is a composite instance, such as a tuple.
 Class<T> getTypeClass()
          Returns the Java type represented by this PType.
 void initialize(org.apache.hadoop.conf.Configuration conf)
          Initialize this PType for use within a DoFn.
 

Method Detail

getTypeClass

Class<T> getTypeClass()
Returns the Java type represented by this PType.


getFamily

PTypeFamily getFamily()
Returns the PTypeFamily that this PType belongs to.


getInputMapFn

MapFn<Object,T> getInputMapFn()

getOutputMapFn

MapFn<T,Object> getOutputMapFn()

getConverter

Converter getConverter()

initialize

void initialize(org.apache.hadoop.conf.Configuration conf)
Initialize this PType for use within a DoFn. This generally only needs to be called when using a PType for getDetachedValue(Object).

Parameters:
conf - Configuration object
See Also:
getDetachedValue(Object)

getDetachedValue

T getDetachedValue(T value)
Returns a copy of a value (or the value itself) that can safely be retained.

This is useful when iterable values being processed in a DoFn (via a reducer) need to be held on to for more than the scope of a single iteration, as a reducer (and therefore also a DoFn that has an Iterable as input) re-use deserialized values. More information on object reuse is available in the DoFn class documentation.

Parameters:
value - The value to be deep-copied
Returns:
A deep copy of the input value

getDefaultFileSource

ReadableSourceTarget<T> getDefaultFileSource(org.apache.hadoop.fs.Path path)
Returns a SourceTarget that is able to read/write data using the serialization format specified by this PType.


getSubTypes

List<PType> getSubTypes()
Returns the sub-types that make up this PType if it is a composite instance, such as a tuple.



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.