|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.crunch.lib.join.BloomFilterJoinStrategy<K,U,V>
public class BloomFilterJoinStrategy<K,U,V>
Join strategy that uses a Bloom filter that is trained on the keys of the left-side table to filter the key/value pairs of the right-side table before sending through the shuffle and reduce phase.
This strategy is useful in cases where the right-side table contains many keys that are not present in the left-side table. In this case, the use of the Bloom filter avoids a potentially costly shuffle phase for data that would never be joined to the left side.
Constructor Summary | |
---|---|
BloomFilterJoinStrategy(int numElements)
Instantiate with the expected number of unique keys in the left table. |
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate)
Instantiate with the expected number of unique keys in the left table, and the acceptable false positive rate for the Bloom filter. |
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate,
JoinStrategy<K,U,V> delegateJoinStrategy)
Instantiate with the expected number of unique keys in the left table, and the acceptable false positive rate for the Bloom filter, and an underlying join strategy to delegate to. |
Method Summary | |
---|---|
PTable<K,Pair<U,V>> |
join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BloomFilterJoinStrategy(int numElements)
The DefaultJoinStrategy
will be used to perform the actual join after filtering.
numElements
- expected number of unique keyspublic BloomFilterJoinStrategy(int numElements, float falsePositiveRate)
The DefaultJoinStrategy
will be used to perform the actual join after filtering.
numElements
- expected number of unique keysfalsePositiveRate
- acceptable false positive rate for Bloom Filterpublic BloomFilterJoinStrategy(int numElements, float falsePositiveRate, JoinStrategy<K,U,V> delegateJoinStrategy)
numElements
- expected number of unique keysfalsePositiveRate
- acceptable false positive rate for Bloom FilterdelegateJoinStrategy
- join strategy to delegate to after filteringMethod Detail |
---|
public PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
JoinStrategy
join
in interface JoinStrategy<K,U,V>
left
- left table to be joinedright
- right table to be joinedjoinType
- type of join to perform
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |