This project has retired. For details please refer to its Attic page.
FilterFn (Apache Crunch 0.3.0-incubating API)

org.apache.crunch
Class FilterFn<T>

java.lang.Object
  extended by org.apache.crunch.DoFn<T,T>
      extended by org.apache.crunch.FilterFn<T>
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
FilterFn.AndFn, FilterFn.NotFn, FilterFn.OrFn

public abstract class FilterFn<T>
extends DoFn<T,T>

A DoFn for the common case of filtering the members of a PCollection based on a boolean condition.

See Also:
Serialized Form

Nested Class Summary
static class FilterFn.AndFn<S>
           
static class FilterFn.NotFn<S>
           
static class FilterFn.OrFn<S>
           
 
Constructor Summary
FilterFn()
           
 
Method Summary
abstract  boolean accept(T input)
          If true, emit the given record.
static
<S> FilterFn<S>
and(FilterFn<S>... fns)
           
static
<S> FilterFn<S>
not(FilterFn<S> fn)
           
static
<S> FilterFn<S>
or(FilterFn<S>... fns)
           
 void process(T input, Emitter<T> emitter)
          Processes the records from a PCollection.
 float scaleFactor()
          Returns an estimate of how applying this function to a PCollection will cause it to change in side.
 
Methods inherited from class org.apache.crunch.DoFn
cleanup, configure, initialize, setConfigurationForTest, setContext
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FilterFn

public FilterFn()
Method Detail

accept

public abstract boolean accept(T input)
If true, emit the given record.


process

public void process(T input,
                    Emitter<T> emitter)
Description copied from class: DoFn
Processes the records from a PCollection.

Note: Crunch can reuse a single input record object whose content changes on each DoFn.process(Object, Emitter) method call. This functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of.

Specified by:
process in class DoFn<T,T>
Parameters:
input - The input record.
emitter - The emitter to send the output to

scaleFactor

public float scaleFactor()
Description copied from class: DoFn
Returns an estimate of how applying this function to a PCollection will cause it to change in side. The optimizer uses these estimates to decide where to break up dependent MR jobs into separate Map and Reduce phases in order to minimize I/O.

Subclasses of DoFn that will substantially alter the size of the resulting PCollection should override this method.

Overrides:
scaleFactor in class DoFn<T,T>

and

public static <S> FilterFn<S> and(FilterFn<S>... fns)

or

public static <S> FilterFn<S> or(FilterFn<S>... fns)

not

public static <S> FilterFn<S> not(FilterFn<S> fn)


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.