public abstract class MapFn<S,T> extends DoFn<S,T>
DoFn for the common case of emitting exactly one value for each
input record.| Constructor and Description |
|---|
MapFn() |
| Modifier and Type | Method and Description |
|---|---|
abstract T |
map(S input)
Maps the given input into an instance of the output type.
|
void |
process(S input,
Emitter<T> emitter)
Processes the records from a
PCollection. |
float |
scaleFactor()
Returns an estimate of how applying this function to a
PCollection
will cause it to change in side. |
cleanup, configure, disableDeepCopy, initialize, setConfiguration, setContextpublic void process(S input, Emitter<T> emitter)
DoFnPCollection.
DoFn.process(Object, Emitter) method call. This
functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value
objects that are passed into the reduce, therefore the application should
clone the objects they want to keep a copy of.public float scaleFactor()
DoFnPCollection
will cause it to change in side. The optimizer uses these estimates to
decide where to break up dependent MR jobs into separate Map and Reduce
phases in order to minimize I/O.
Subclasses of DoFn that will substantially alter the size of the
resulting PCollection should override this method.
scaleFactor in class DoFn<S,T>Copyright © 2015 The Apache Software Foundation. All Rights Reserved.