This project has retired. For details please refer to its Attic page.
JoinFn (Apache Crunch 0.9.0 API)

Class JoinFn<K,U,V>

  extended by org.apache.crunch.DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>
      extended by org.apache.crunch.lib.join.JoinFn<K,U,V>
Type Parameters:
K - Type of the keys.
U - Type of the first PTable's values
V - Type of the second PTable's values
All Implemented Interfaces:
Direct Known Subclasses:
FullOuterJoinFn, InnerJoinFn, LeftOuterJoinFn, RightOuterJoinFn

public abstract class JoinFn<K,U,V>
extends DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>

Represents a DoFn for performing joins.

See Also:
Serialized Form

Field Summary
protected  PType<K> keyType
protected  PType<U> leftValueType
Constructor Summary
JoinFn(PType<K> keyType, PType<U> leftValueType)
          Instantiate with the PType of the value of the left side of the join (used for creating deep copies of values).
Method Summary
abstract  String getJoinType()
 void initialize()
          Initialize this DoFn.
abstract  void join(K key, int id, Iterable<Pair<U,V>> pairs, Emitter<Pair<K,Pair<U,V>>> emitter)
          Performs the actual joining.
 void process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input, Emitter<Pair<K,Pair<U,V>>> emitter)
          Split up the input record to make coding a bit more manageable.
Methods inherited from class org.apache.crunch.DoFn
cleanup, configure, disableDeepCopy, getConfiguration, getContext, getCounter, getCounter, getStatus, getTaskAttemptID, increment, increment, increment, increment, progress, scaleFactor, setConfiguration, setContext, setStatus
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


protected PType<K> keyType


protected PType<U> leftValueType
Constructor Detail


public JoinFn(PType<K> keyType,
              PType<U> leftValueType)
Instantiate with the PType of the value of the left side of the join (used for creating deep copies of values).

keyType - The PType of the value used as the key of the join
leftValueType - The PType of the value type of the left side of the join
Method Detail


public void initialize()
Description copied from class: DoFn
Initialize this DoFn. This initialization will happen before the actual DoFn.process(Object, Emitter) is triggered. Subclasses may override this method to do appropriate initialization.

Called during the setup of the job instance this DoFn is associated with.

initialize in class DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>


public abstract String getJoinType()
The name of this join type (e.g. innerJoin, leftOuterJoin).


public abstract void join(K key,
                          int id,
                          Iterable<Pair<U,V>> pairs,
                          Emitter<Pair<K,Pair<U,V>>> emitter)
Performs the actual joining.

key - The key for this grouping of values.
id - The side that this group of values is from (0 -> left, 1 -> right).
pairs - The group of values associated with this key and id pair.
emitter - The emitter to send the output to.


public void process(Pair<Pair<K,Integer>,Iterable<Pair<U,V>>> input,
                    Emitter<Pair<K,Pair<U,V>>> emitter)
Split up the input record to make coding a bit more manageable.

Specified by:
process in class DoFn<Pair<Pair<K,Integer>,Iterable<Pair<U,V>>>,Pair<K,Pair<U,V>>>
input - The input record.
emitter - The emitter to send the output to.

Copyright © 2014 The Apache Software Foundation. All Rights Reserved.