This project has retired. For details please refer to its Attic page.
MapsideJoin (Apache Crunch 0.4.0-incubating API)

org.apache.crunch.lib.join
Class MapsideJoin

java.lang.Object
  extended by org.apache.crunch.lib.join.MapsideJoin

public class MapsideJoin
extends Object

Utility for doing map side joins on a common key between two PTables.

A map side join is an optimized join which doesn't use a reducer; instead, the right side of the join is loaded into memory and the join is performed in a mapper. This style of join has the important implication that the output of the join is not sorted, which is the case with a conventional (reducer-based) join.

Note:This utility is only supported when running with a MRPipeline as the pipeline.


Constructor Summary
MapsideJoin()
           
 
Method Summary
static
<K,U,V> PTable<K,Pair<U,V>>
join(PTable<K,U> left, PTable<K,V> right)
          Join two tables using a map side join.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MapsideJoin

public MapsideJoin()
Method Detail

join

public static <K,U,V> PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                               PTable<K,V> right)
Join two tables using a map side join. The right-side table will be loaded fully in memory, so this method should only be used if the right side table's contents can fit in the memory allocated to mappers. The join performed by this method is an inner join.

Parameters:
left - The left-side table of the join
right - The right-side table of the join, whose contents will be fully read into memory
Returns:
A table keyed on the join key, containing pairs of joined values


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.