|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.crunch.lib.join.BloomFilterJoinStrategy<K,U,V>
public class BloomFilterJoinStrategy<K,U,V>
Join strategy that uses a Bloom filter that is trained on the keys of the left-side table to filter the key/value pairs of the right-side table before sending through the shuffle and reduce phase.
This strategy is useful in cases where the right-side table contains many keys that are not present in the left-side table. In this case, the use of the Bloom filter avoids a potentially costly shuffle phase for data that would never be joined to the left side.
| Constructor Summary | |
|---|---|
BloomFilterJoinStrategy(int numElements)
Instantiate with the expected number of unique keys in the left table. |
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate)
Instantiate with the expected number of unique keys in the left table, and the acceptable false positive rate for the Bloom filter. |
|
BloomFilterJoinStrategy(int numElements,
float falsePositiveRate,
JoinStrategy<K,U,V> delegateJoinStrategy)
Instantiate with the expected number of unique keys in the left table, and the acceptable false positive rate for the Bloom filter, and an underlying join strategy to delegate to. |
|
| Method Summary | |
|---|---|
PTable<K,Pair<U,V>> |
join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
Join two tables with the given join type. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BloomFilterJoinStrategy(int numElements)
The DefaultJoinStrategy will be used to perform the actual join after filtering.
numElements - expected number of unique keys
public BloomFilterJoinStrategy(int numElements,
float falsePositiveRate)
The DefaultJoinStrategy will be used to perform the actual join after filtering.
numElements - expected number of unique keysfalsePositiveRate - acceptable false positive rate for Bloom Filter
public BloomFilterJoinStrategy(int numElements,
float falsePositiveRate,
JoinStrategy<K,U,V> delegateJoinStrategy)
numElements - expected number of unique keysfalsePositiveRate - acceptable false positive rate for Bloom FilterdelegateJoinStrategy - join strategy to delegate to after filtering| Method Detail |
|---|
public PTable<K,Pair<U,V>> join(PTable<K,U> left,
PTable<K,V> right,
JoinType joinType)
JoinStrategy
join in interface JoinStrategy<K,U,V>left - left table to be joinedright - right table to be joinedjoinType - type of join to perform
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||