Class MapsideJoin

  extended by org.apache.crunch.lib.join.MapsideJoin

public class MapsideJoin
extends Object

Utility for doing map side joins on a common key between two PTables.

A map side join is an optimized join which doesn't use a reducer; instead, the right side of the join is loaded into memory and the join is performed in a mapper. This style of join has the important implication that the output of the join is not sorted, which is the case with a conventional (reducer-based) join.

Note:This utility is only supported when running with a MRPipeline as the pipeline.

Constructor Summary
Method Summary
<K,U,V> PTable<K,Pair<U,V>>
join(PTable<K,U> left, PTable<K,V> right)
          Join two tables using a map side join.
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public MapsideJoin()
Method Detail


public static <K,U,V> PTable<K,Pair<U,V>> join(PTable<K,U> left,
                                               PTable<K,V> right)
Join two tables using a map side join. The right-side table will be loaded fully in memory, so this method should only be used if the right side table's contents can fit in the memory allocated to mappers. The join performed by this method is an inner join.

left - The left-side table of the join
right - The right-side table of the join, whose contents will be fully read into memory
A table keyed on the join key, containing pairs of joined values

Copyright © 2012 The Apache Software Foundation. All Rights Reserved.