This project has retired. For details please refer to its Attic page.
BloomFilterFn (Apache Crunch 0.11.0 API)

org.apache.crunch.contrib.bloomfilter
Class BloomFilterFn<S>

java.lang.Object
  extended by org.apache.crunch.DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
      extended by org.apache.crunch.contrib.bloomfilter.BloomFilterFn<S>
All Implemented Interfaces:
Serializable

public abstract class BloomFilterFn<S>
extends DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>

The class is responsible for generating keys that are used in a BloomFilter

See Also:
Serialized Form

Field Summary
static String CRUNCH_FILTER_NAME
           
static String CRUNCH_FILTER_SIZE
           
 
Constructor Summary
BloomFilterFn()
           
 
Method Summary
 void cleanup(Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter)
          Called during the cleanup of the MapReduce job this DoFn is associated with.
abstract  Collection<org.apache.hadoop.util.bloom.Key> generateKeys(S input)
           
 void initialize()
          Initialize this DoFn.
 void process(S input, Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter)
          Processes the records from a PCollection.
 
Methods inherited from class org.apache.crunch.DoFn
configure, disableDeepCopy, scaleFactor, setConfiguration, setContext
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CRUNCH_FILTER_SIZE

public static final String CRUNCH_FILTER_SIZE
See Also:
Constant Field Values

CRUNCH_FILTER_NAME

public static final String CRUNCH_FILTER_NAME
See Also:
Constant Field Values
Constructor Detail

BloomFilterFn

public BloomFilterFn()
Method Detail

initialize

public void initialize()
Description copied from class: DoFn
Initialize this DoFn. This initialization will happen before the actual DoFn.process(Object, Emitter) is triggered. Subclasses may override this method to do appropriate initialization.

Called during the setup of the job instance this DoFn is associated with.

Overrides:
initialize in class DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>

process

public void process(S input,
                    Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter)
Description copied from class: DoFn
Processes the records from a PCollection.

Note: Crunch can reuse a single input record object whose content changes on each DoFn.process(Object, Emitter) method call. This functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of.

Specified by:
process in class DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
Parameters:
input - The input record.
emitter - The emitter to send the output to

generateKeys

public abstract Collection<org.apache.hadoop.util.bloom.Key> generateKeys(S input)

cleanup

public void cleanup(Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter)
Description copied from class: DoFn
Called during the cleanup of the MapReduce job this DoFn is associated with. Subclasses may override this method to do appropriate cleanup.

Overrides:
cleanup in class DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
Parameters:
emitter - The emitter that was used for output


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.