public class BloomFilterJoinStrategy<K,U,V> extends Object implements JoinStrategy<K,U,V>
This strategy is useful in cases where the right-side table contains many keys that are not present in the left-side table. In this case, the use of the Bloom filter avoids a potentially costly shuffle phase for data that would never be joined to the left side.
Implementation Note: right and full outer join type are handled by splitting the right-side table (the bigger one) into two disjunctive streams: negatively filtered (right outer part) and positively filtered (passed to delegate strategy).
Constructor and Description |
---|
BloomFilterJoinStrategy(int numElements)
Instantiate with the expected number of unique keys in the left table.
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate)
Instantiate with the expected number of unique keys in the left table, and the acceptable
false positive rate for the Bloom filter.
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate,
JoinStrategy<K,U,V> delegateJoinStrategy)
Instantiate with the expected number of unique keys in the left table, and the acceptable
false positive rate for the Bloom filter, and an underlying join strategy to delegate to.
|
public BloomFilterJoinStrategy(int numElements)
The DefaultJoinStrategy
will be used to perform the actual join after filtering.
numElements
- expected number of unique keyspublic BloomFilterJoinStrategy(int numElements, float falsePositiveRate)
The DefaultJoinStrategy
will be used to perform the actual join after filtering.
numElements
- expected number of unique keysfalsePositiveRate
- acceptable false positive rate for Bloom Filterpublic BloomFilterJoinStrategy(int numElements, float falsePositiveRate, JoinStrategy<K,U,V> delegateJoinStrategy)
numElements
- expected number of unique keysfalsePositiveRate
- acceptable false positive rate for Bloom FilterdelegateJoinStrategy
- join strategy to delegate to after filteringpublic PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
JoinStrategy
join
in interface JoinStrategy<K,U,V>
left
- left table to be joinedright
- right table to be joinedjoinType
- type of join to performCopyright © 2017 The Apache Software Foundation. All rights reserved.