This project has retired. For details please refer to its Attic page.
MapsideJoinStrategy (Apache Crunch 0.9.0 API)

org.apache.crunch.lib.join
Class MapsideJoinStrategy<K,U,V>

java.lang.Object
  extended by org.apache.crunch.lib.join.MapsideJoinStrategy<K,U,V>
All Implemented Interfaces:
Serializable, JoinStrategy<K,U,V>

public class MapsideJoinStrategy<K,U,V>
extends Object
implements JoinStrategy<K,U,V>

Utility for doing map side joins on a common key between two PTables.

A map side join is an optimized join which doesn't use a reducer; instead, the right side of the join is loaded into memory and the join is performed in a mapper. This style of join has the important implication that the output of the join is not sorted, which is the case with a conventional (reducer-based) join.

See Also:
Serialized Form

Constructor Summary
MapsideJoinStrategy()
          Constructs a new instance of the MapsideJoinStratey, materializing the right-side join table to disk before the join is performed.
MapsideJoinStrategy(boolean materialize)
          Constructs a new instance of the MapsideJoinStrategy.
 
Method Summary
 PTable<K,Pair<U,V>> join(PTable<K,U> left, PTable<K,V> right, JoinType joinType)
          Join two tables with the given join type.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MapsideJoinStrategy

public MapsideJoinStrategy()
Constructs a new instance of the MapsideJoinStratey, materializing the right-side join table to disk before the join is performed.


MapsideJoinStrategy

public MapsideJoinStrategy(boolean materialize)
Constructs a new instance of the MapsideJoinStrategy. If the materialize} argument is true, then the right-side join PTable will be materialized to disk before the in-memory join is performed. If it is false, then Crunch can optionally read and process the data from the right-side table without having to run a job to materialize the data to disk first.

Parameters:
materialize - Whether or not to materialize the right-side table before the join
Method Detail

join

public PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                PTable<K,V> right,
                                JoinType joinType)
Description copied from interface: JoinStrategy
Join two tables with the given join type.

Specified by:
join in interface JoinStrategy<K,U,V>
Parameters:
left - left table to be joined
right - right table to be joined
joinType - type of join to perform
Returns:
joined tables


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.