public class BloomFilterJoinStrategy<K,U,V> extends Object implements JoinStrategy<K,U,V>
This strategy is useful in cases where the right-side table contains many keys that are not present in the left-side table. In this case, the use of the Bloom filter avoids a potentially costly shuffle phase for data that would never be joined to the left side.
Implementation Note: right and full outer join type are handled by splitting the right-side table (the bigger one) into two disjunctive streams: negatively filtered (right outer part) and positively filtered (passed to delegate strategy).
| Constructor and Description |
|---|
BloomFilterJoinStrategy(int numElements)
Instantiate with the expected number of unique keys in the left table.
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate)
Instantiate with the expected number of unique keys in the left table, and the acceptable
false positive rate for the Bloom filter.
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate,
JoinStrategy<K,U,V> delegateJoinStrategy)
Instantiate with the expected number of unique keys in the left table, and the acceptable
false positive rate for the Bloom filter, and an underlying join strategy to delegate to.
|
public BloomFilterJoinStrategy(int numElements)
The DefaultJoinStrategy will be used to perform the actual join after filtering.
numElements - expected number of unique keyspublic BloomFilterJoinStrategy(int numElements,
float falsePositiveRate)
The DefaultJoinStrategy will be used to perform the actual join after filtering.
numElements - expected number of unique keysfalsePositiveRate - acceptable false positive rate for Bloom Filterpublic BloomFilterJoinStrategy(int numElements,
float falsePositiveRate,
JoinStrategy<K,U,V> delegateJoinStrategy)
numElements - expected number of unique keysfalsePositiveRate - acceptable false positive rate for Bloom FilterdelegateJoinStrategy - join strategy to delegate to after filteringpublic PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
JoinStrategyjoin in interface JoinStrategy<K,U,V>left - left table to be joinedright - right table to be joinedjoinType - type of join to performCopyright © 2017 The Apache Software Foundation. All rights reserved.