public class ShardedJoinStrategy<K,U,V> extends Object implements JoinStrategy<K,U,V>
This strategy is useful when there are multiple values per key on at least one side of the join, and a large proportion of the values are mapped to a small number of keys.
Using this strategy will increase the number of keys being joined, but can increase performance by spreading processing of a single key over multiple reduce groups.
A custom ShardedJoinStrategy.ShardingStrategy can be provided so that only certain keys are sharded, or
keys can be sharded in accordance with how many values are mapped to them.
| Modifier and Type | Class and Description |
|---|---|
static interface |
ShardedJoinStrategy.ShardingStrategy<K>
Determines over how many shards a key will be split in a sharded join.
|
| Constructor and Description |
|---|
ShardedJoinStrategy(int numShards)
Instantiate with a constant number of shards to use for all keys.
|
ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
Instantiate with a custom sharding strategy.
|
public ShardedJoinStrategy(int numShards)
numShards - number of shards to usepublic ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
shardingStrategy - strategy to be used for shardingpublic PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
JoinStrategyjoin in interface JoinStrategy<K,U,V>left - left table to be joinedright - right table to be joinedjoinType - type of join to performCopyright © 2015 The Apache Software Foundation. All Rights Reserved.