org.apache.crunch
Class FilterFn<T>
java.lang.Object
org.apache.crunch.DoFn<T,T>
org.apache.crunch.FilterFn<T>
- All Implemented Interfaces:
- Serializable
- Direct Known Subclasses:
- FilterFn.AndFn, FilterFn.NotFn, FilterFn.OrFn
public abstract class FilterFn<T>
- extends DoFn<T,T>
A DoFn for the common case of filtering the members of a
PCollection based on a boolean condition.
- See Also:
- Serialized Form
FilterFn
public FilterFn()
accept
public abstract boolean accept(T input)
- If true, emit the given record.
process
public void process(T input,
Emitter<T> emitter)
- Description copied from class:
DoFn
- Processes the records from a
PCollection.
Note: Crunch can reuse a single input record object whose content
changes on each DoFn.process(Object, Emitter) method call. This
functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value
objects that are passed into the reduce, therefore the application should
clone the objects they want to keep a copy of.
- Specified by:
process in class DoFn<T,T>
- Parameters:
input - The input record.emitter - The emitter to send the output to
scaleFactor
public float scaleFactor()
- Description copied from class:
DoFn
- Returns an estimate of how applying this function to a
PCollection
will cause it to change in side. The optimizer uses these estimates to
decide where to break up dependent MR jobs into separate Map and Reduce
phases in order to minimize I/O.
Subclasses of DoFn that will substantially alter the size of the
resulting PCollection should override this method.
- Overrides:
scaleFactor in class DoFn<T,T>
and
public static <S> FilterFn<S> and(FilterFn<S>... fns)
or
public static <S> FilterFn<S> or(FilterFn<S>... fns)
not
public static <S> FilterFn<S> not(FilterFn<S> fn)
Copyright © 2012 The Apache Software Foundation. All Rights Reserved.