This project has retired. For details please refer to its Attic page.
JoinFn (Apache Crunch 0.8.0 API)

org.apache.crunch.lib.join
Class JoinFn<K,U,V>

java.lang.Object
  extended by org.apache.crunch.DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>
      extended by org.apache.crunch.lib.join.JoinFn<K,U,V>
Type Parameters:
K - Type of the keys.
U - Type of the first PTable's values
V - Type of the second PTable's values
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
FullOuterJoinFn, InnerJoinFn, LeftOuterJoinFn, RightOuterJoinFn

public abstract class JoinFn<K,U,V>
extends DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>

Represents a DoFn for performing joins.

See Also:
Serialized Form

Constructor Summary
JoinFn(PType<K> keyType, PType<U> leftValueType)
          Instantiate with the PType of the value of the left side of the join (used for creating deep copies of values).
 
Method Summary
abstract  String getJoinType()
           
 void initialize()
          Initialize this DoFn.
abstract  void join(K key, int id, Iterable<Pair<U,V>> pairs, Emitter<Pair<K,Pair<U,V>>> emitter)
          Performs the actual joining.
 void process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input, Emitter<Pair<K,Pair<U,V>>> emitter)
          Split up the input record to make coding a bit more manageable.
 
Methods inherited from class org.apache.crunch.DoFn
cleanup, configure, disableDeepCopy, scaleFactor, setContext
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JoinFn

public JoinFn(PType<K> keyType,
              PType<U> leftValueType)
Instantiate with the PType of the value of the left side of the join (used for creating deep copies of values).

Parameters:
keyType - The PType of the value used as the key of the join
leftValueType - The PType of the value type of the left side of the join
Method Detail

initialize

public void initialize()
Description copied from class: DoFn
Initialize this DoFn. This initialization will happen before the actual DoFn.process(Object, Emitter) is triggered. Subclasses may override this method to do appropriate initialization.

Called during the setup of the job instance this DoFn is associated with.

Overrides:
initialize in class DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>

getJoinType

public abstract String getJoinType()
Returns:
The name of this join type (e.g. innerJoin, leftOuterJoin).

join

public abstract void join(K key,
                          int id,
                          Iterable<Pair<U,V>> pairs,
                          Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.

Parameters:
key - The key for this grouping of values.
id - The side that this group of values is from (0 -> left, 1 -> right).
pairs - The group of values associated with this key and id pair.
emitter - The emitter to send the output to.

process

public void process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
                    Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.

Specified by:
process in class DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>
Parameters:
input - The input record.
emitter - The emitter to send the output to.


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.