This project has retired. For details please refer to its Attic page.
ShardedJoinStrategy (Apache Crunch 0.10.0 API)

org.apache.crunch.lib.join
Class ShardedJoinStrategy<K,U,V>

java.lang.Object
  extended by org.apache.crunch.lib.join.ShardedJoinStrategy<K,U,V>
All Implemented Interfaces:
Serializable, JoinStrategy<K,U,V>

public class ShardedJoinStrategy<K,U,V>
extends Object
implements JoinStrategy<K,U,V>

JoinStrategy that splits the key space up into shards.

This strategy is useful when there are multiple values per key on at least one side of the join, and a large proportion of the values are mapped to a small number of keys.

Using this strategy will increase the number of keys being joined, but can increase performance by spreading processing of a single key over multiple reduce groups.

A custom ShardedJoinStrategy.ShardingStrategy can be provided so that only certain keys are sharded, or keys can be sharded in accordance with how many values are mapped to them.

See Also:
Serialized Form

Nested Class Summary
static interface ShardedJoinStrategy.ShardingStrategy<K>
          Determines over how many shards a key will be split in a sharded join.
 
Constructor Summary
ShardedJoinStrategy(int numShards)
          Instantiate with a constant number of shards to use for all keys.
ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
          Instantiate with a custom sharding strategy.
 
Method Summary
 PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
          Join two tables with the given join type.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ShardedJoinStrategy

public ShardedJoinStrategy(int numShards)
Instantiate with a constant number of shards to use for all keys.

Parameters:
numShards - number of shards to use

ShardedJoinStrategy

public ShardedJoinStrategy(ShardedJoinStrategy.ShardingStrategy<K> shardingStrategy)
Instantiate with a custom sharding strategy.

Parameters:
shardingStrategy - strategy to be used for sharding
Method Detail

join

public PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                PTable<K,V> right,
                                JoinType joinType)
Description copied from interface: JoinStrategy
Join two tables with the given join type.

Specified by:
join in interface JoinStrategy<K,U,V>
Parameters:
left - left table to be joined
right - right table to be joined
joinType - type of join to perform
Returns:
joined tables


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.