org.apache.crunch
Class MapFn<S,T>

java.lang.Object
  extended by org.apache.crunch.DoFn<S,T>
      extended by org.apache.crunch.MapFn<S,T>
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
CompositeMapFn, ExtractKeyFn, IdentityFn, PairMapFn, PGroupedTableType.PairIterableMapFn, SortFns.AvroGenericFn, SortFns.SingleKeyFn, SortFns.TupleKeyFn

public abstract class MapFn<S,T>
extends DoFn<S,T>

A DoFn for the common case of emitting exactly one value for each input record.

See Also:
Serialized Form

Constructor Summary
MapFn()
           
 
Method Summary
abstract  T map(S input)
          Maps the given input into an instance of the output type.
 void process(S input, Emitter<T> emitter)
          Processes the records from a PCollection.
 float scaleFactor()
          Returns an estimate of how applying this function to a PCollection will cause it to change in side.
 
Methods inherited from class org.apache.crunch.DoFn
cleanup, configure, disableDeepCopy, initialize, setContext
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MapFn

public MapFn()
Method Detail

map

public abstract T map(S input)
Maps the given input into an instance of the output type.


process

public void process(S input,
                    Emitter<T> emitter)
Description copied from class: DoFn
Processes the records from a PCollection.

Note: Crunch can reuse a single input record object whose content changes on each DoFn.process(Object, Emitter) method call. This functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of.

Specified by:
process in class DoFn<S,T>
Parameters:
input - The input record.
emitter - The emitter to send the output to

scaleFactor

public float scaleFactor()
Description copied from class: DoFn
Returns an estimate of how applying this function to a PCollection will cause it to change in side. The optimizer uses these estimates to decide where to break up dependent MR jobs into separate Map and Reduce phases in order to minimize I/O.

Subclasses of DoFn that will substantially alter the size of the resulting PCollection should override this method.

Overrides:
scaleFactor in class DoFn<S,T>


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.