public class ShardedJoinStrategy<K,U,V> extends Object implements JoinStrategy<K,U,V>
This strategy is useful when there are multiple values per key on at least one side of the join, and a large proportion of the values are mapped to a small number of keys.
Using this strategy will increase the number of keys being joined, but can increase performance by spreading processing of a single key over multiple reduce groups.
A custom ShardedJoinStrategy.ShardingStrategy
can be provided so that only certain keys are sharded, or
keys can be sharded in accordance with how many values are mapped to them.
Modifier and Type | Class and Description |
---|---|
static interface |
ShardedJoinStrategy.ShardingStrategy<K>
Determines over how many shards a key will be split in a sharded join.
|
Constructor and Description |
---|
ShardedJoinStrategy(int numShards)
Instantiate with a constant number of shards to use for all keys.
|
ShardedJoinStrategy(int numShards,
int numReducers)
Instantiate with a constant number of shards to use for all keys.
|
ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
Instantiate with a custom sharding strategy.
|
ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy,
int numReducers)
Instantiate with a custom sharding strategy and a specified number of reducers.
|
public ShardedJoinStrategy(int numShards)
numShards
- number of shards to usepublic ShardedJoinStrategy(int numShards, int numReducers)
numShards
- number of shards to usenumReducers
- the amount of reducers to run the join withpublic ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
shardingStrategy
- strategy to be used for shardingpublic ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy, int numReducers)
shardingStrategy
- strategy to be used for shardingnumReducers
- the amount of reducers to run the join withpublic PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
JoinStrategy
join
in interface JoinStrategy<K,U,V>
left
- left table to be joinedright
- right table to be joinedjoinType
- type of join to performCopyright © 2016 The Apache Software Foundation. All rights reserved.