Class ShardedJoinStrategy<K,U,V>

  extended by org.apache.crunch.lib.join.ShardedJoinStrategy<K,U,V>
All Implemented Interfaces:
Serializable, JoinStrategy<K,U,V>

public class ShardedJoinStrategy<K,U,V>
extends Object
implements JoinStrategy<K,U,V>

JoinStrategy that splits the key space up into shards.

This strategy is useful when there are multiple values per key on at least one side of the join, and a large proportion of the values are mapped to a small number of keys.

Using this strategy will increase the number of keys being joined, but can increase performance by spreading processing of a single key over multiple reduce groups.

A custom ShardedJoinStrategy.ShardingStrategy can be provided so that only certain keys are sharded, or keys can be sharded in accordance with how many values are mapped to them.

Nested Class Summary
static interface ShardedJoinStrategy.ShardingStrategy<K>
          Determines over how many shards a key will be split in a sharded join.
Constructor Summary
ShardedJoinStrategy(int numShards)
          Instantiate with a constant number of shards to use for all keys.
ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
          Instantiate with a custom sharding strategy.
Method Summary
 PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
          Join two tables with the given join type.
Constructor Detail


public ShardedJoinStrategy(int numShards)
Instantiate with a constant number of shards to use for all keys.

numShards - number of shards to use


public ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
Instantiate with a custom sharding strategy.

shardingStrategy - strategy to be used for sharding
Method Detail


public PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                PTable<K,V> right,
                                JoinType joinType)
Description copied from interface: JoinStrategy
Join two tables with the given join type.

Specified by:
join in interface JoinStrategy<K,U,V>
left - left table to be joined
right - right table to be joined
joinType - type of join to perform
joined tables

