This project has retired. For details please refer to its Attic page.
OneToManyJoin (Apache Crunch 0.9.0 API)

org.apache.crunch.lib.join
Class OneToManyJoin

java.lang.Object
  extended by org.apache.crunch.lib.join.OneToManyJoin

public class OneToManyJoin
extends Object

Optimized join for situations where exactly one value is being joined with any other number of values based on a common key.


Constructor Summary
OneToManyJoin()
           
 
Method Summary
static
<K,U,V,T> PCollection<T>
oneToManyJoin(PTable<K,U> left, PTable<K,V> right, DoFn<Pair<U,Iterable<V>>,T> postProcessFn, PType<T> ptype)
          Performs a join on two tables, where the left table only contains a single value per key.
static
<K,U,V,T> PCollection<T>
oneToManyJoin(PTable<K,U> left, PTable<K,V> right, DoFn<Pair<U,Iterable<V>>,T> postProcessFn, PType<T> ptype, int numReducers)
          Supports a user-specified number of reducers for the one-to-many join.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OneToManyJoin

public OneToManyJoin()
Method Detail

oneToManyJoin

public static <K,U,V,T> PCollection<T> oneToManyJoin(PTable<K,U> left,
                                                     PTable<K,V> right,
                                                     DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
                                                     PType<T> ptype)
Performs a join on two tables, where the left table only contains a single value per key.

This method accepts a DoFn, which is responsible for converting the single left-side value and the iterable of right-side values into output values.

This method of joining is useful when there is a single context value that contains a large number of related values, and all related values must be brought together, with the quantity of the right-side values being too big to fit in memory.

If there are multiple values for the same key in the left-side table, only a single one will be used.

Parameters:
left - left-side table to join
right - right-side table to join
postProcessFn - DoFn to process the results of the join
ptype - type of the output of the postProcessFn
Returns:
the post-processed output of the join

oneToManyJoin

public static <K,U,V,T> PCollection<T> oneToManyJoin(PTable<K,U> left,
                                                     PTable<K,V> right,
                                                     DoFn<Pair<U,Iterable<V>>,T> postProcessFn,
                                                     PType<T> ptype,
                                                     int numReducers)
Supports a user-specified number of reducers for the one-to-many join.

Parameters:
left - left-side table to join
right - right-side table to join
postProcessFn - DoFn to process the results of the join
ptype - type of the output of the postProcessFn
numReducers - The number of reducers to use
Returns:
the post-processed output of the join


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.