This project has retired. For details please refer to its Attic page.
MapsideJoinStrategy (Apache Crunch 0.8.0 API)

org.apache.crunch.lib.join
Class MapsideJoinStrategy<K,U,V>

java.lang.Object
  extended by org.apache.crunch.lib.join.MapsideJoinStrategy<K,U,V>
All Implemented Interfaces:
Serializable, JoinStrategy<K,U,V>

public class MapsideJoinStrategy<K,U,V>
extends Object
implements JoinStrategy<K,U,V>

Utility for doing map side joins on a common key between two PTables.

A map side join is an optimized join which doesn't use a reducer; instead, the right side of the join is loaded into memory and the join is performed in a mapper. This style of join has the important implication that the output of the join is not sorted, which is the case with a conventional (reducer-based) join.

See Also:
Serialized Form

Constructor Summary
MapsideJoinStrategy()
          Constructs a new instance of the MapsideJoinStratey, materializing the right-side join table to disk before the join is performed.
MapsideJoinStrategy(boolean materialize)
          Constructs a new instance of the MapsideJoinStrategy.
 
Method Summary
 PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
          Join two tables with the given join type.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MapsideJoinStrategy

public MapsideJoinStrategy()
Constructs a new instance of the MapsideJoinStratey, materializing the right-side join table to disk before the join is performed.


MapsideJoinStrategy

public MapsideJoinStrategy(boolean materialize)
Constructs a new instance of the MapsideJoinStrategy. If the materialize} argument is true, then the right-side join PTable will be materialized to disk before the in-memory join is performed. If it is false, then Crunch can optionally read and process the data from the right-side table without having to run a job to materialize the data to disk first.

Parameters:
materialize - Whether or not to materialize the right-side table before the join
Method Detail

join

public PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                PTable<K,V> right,
                                JoinType joinType)
Description copied from interface: JoinStrategy
Join two tables with the given join type.

Specified by:
join in interface JoinStrategy<K,U,V>
Parameters:
left - left table to be joined
right - right table to be joined
joinType - type of join to perform
Returns:
joined tables


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.