org.apache.crunch
Class FilterFn<T>

java.lang.Object
  extended by org.apache.crunch.DoFn<T,T>
      extended by org.apache.crunch.FilterFn<T>
All Implemented Interfaces:
Serializable

public abstract class FilterFn<T>
extends DoFn<T,T>

A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.

See Also:
Serialized Form

Constructor Summary
FilterFn()
           
 
Method Summary
abstract  boolean accept(T input)
          If true, emit the given record.
 void cleanup()
          Called during the cleanup of the MapReduce job this FilterFn is associated with.
 void cleanup(Emitter<T> emitter)
          Called during the cleanup of the MapReduce job this DoFn is associated with.
 void process(T input, Emitter<T> emitter)
          Processes the records from a PCollection.
 float scaleFactor()
          Returns an estimate of how applying this function to a PCollection will cause it to change in side.
 
Methods inherited from class org.apache.crunch.DoFn
configure, disableDeepCopy, initialize, setContext
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FilterFn

public FilterFn()
Method Detail

accept

public abstract boolean accept(T input)
If true, emit the given record.


process

public void process(T input,
                    Emitter<T> emitter)
Description copied from class: DoFn
Processes the records from a PCollection.

Note: Crunch can reuse a single input record object whose content changes on each DoFn.process(Object, Emitter) method call. This functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of.

Specified by:
process in class DoFn<T,T>
Parameters:
input - The input record.
emitter - The emitter to send the output to

cleanup

public final void cleanup(Emitter<T> emitter)
Description copied from class: DoFn
Called during the cleanup of the MapReduce job this DoFn is associated with. Subclasses may override this method to do appropriate cleanup.

Overrides:
cleanup in class DoFn<T,T>
Parameters:
emitter - The emitter that was used for output

cleanup

public void cleanup()
Called during the cleanup of the MapReduce job this FilterFn is associated with. Subclasses may override this method to do appropriate cleanup.


scaleFactor

public float scaleFactor()
Description copied from class: DoFn
Returns an estimate of how applying this function to a PCollection will cause it to change in side. The optimizer uses these estimates to decide where to break up dependent MR jobs into separate Map and Reduce phases in order to minimize I/O.

Subclasses of DoFn that will substantially alter the size of the resulting PCollection should override this method.

Overrides:
scaleFactor in class DoFn<T,T>


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.