public abstract class FilterFn<T> extends DoFn<T,T>
DoFn
for the common case of filtering the members of a
PCollection
based on a boolean condition.Constructor and Description |
---|
FilterFn() |
Modifier and Type | Method and Description |
---|---|
abstract boolean |
accept(T input)
If true, emit the given record.
|
void |
cleanup()
Called during the cleanup of the MapReduce job this
FilterFn is
associated with. |
void |
cleanup(Emitter<T> emitter)
Called during the cleanup of the MapReduce job this
DoFn is
associated with. |
void |
process(T input,
Emitter<T> emitter)
Processes the records from a
PCollection . |
float |
scaleFactor()
Returns an estimate of how applying this function to a
PCollection
will cause it to change in side. |
configure, disableDeepCopy, initialize, setConfiguration, setContext
public abstract boolean accept(T input)
public void process(T input, Emitter<T> emitter)
DoFn
PCollection
.
DoFn.process(Object, Emitter)
method call. This
functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value
objects that are passed into the reduce, therefore the application should
clone the objects they want to keep a copy of.public final void cleanup(Emitter<T> emitter)
DoFn
DoFn
is
associated with. Subclasses may override this method to do appropriate
cleanup.public void cleanup()
FilterFn
is
associated with. Subclasses may override this method to do appropriate
cleanup.public float scaleFactor()
DoFn
PCollection
will cause it to change in side. The optimizer uses these estimates to
decide where to break up dependent MR jobs into separate Map and Reduce
phases in order to minimize I/O.
Subclasses of DoFn
that will substantially alter the size of the
resulting PCollection
should override this method.
scaleFactor
in class DoFn<T,T>
Copyright © 2016 The Apache Software Foundation. All rights reserved.