This project has retired. For details please refer to its Attic page.
Set (Apache Crunch 0.3.0-incubating API)

org.apache.crunch.lib
Class Set

java.lang.Object
  extended by org.apache.crunch.lib.Set

public class Set
extends Object

Utilities for performing set operations (difference, intersection, etc) on PCollection instances.


Constructor Summary
Set()
           
 
Method Summary
static
<T> PCollection<Tuple3<T,T,T>>
comm(PCollection<T> coll1, PCollection<T> coll2)
          Find the elements that are common to two sets, like the Unix comm utility.
static
<T> PCollection<T>
difference(PCollection<T> coll1, PCollection<T> coll2)
          Compute the set difference between two sets of elements.
static
<T> PCollection<T>
intersection(PCollection<T> coll1, PCollection<T> coll2)
          Compute the intersection of two sets of elements.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Set

public Set()
Method Detail

difference

public static <T> PCollection<T> difference(PCollection<T> coll1,
                                            PCollection<T> coll2)
Compute the set difference between two sets of elements.

Returns:
a collection containing elements that are in coll1 but not in coll2

intersection

public static <T> PCollection<T> intersection(PCollection<T> coll1,
                                              PCollection<T> coll2)
Compute the intersection of two sets of elements.

Returns:
a collection containing elements that common to both sets coll1 and coll2

comm

public static <T> PCollection<Tuple3<T,T,T>> comm(PCollection<T> coll1,
                                                  PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix comm utility. This method returns a PCollection of Tuple3 objects, and the position in the tuple that an element appears is determined by the collections that it is a member of, as follows:
  1. elements only in coll1,
  2. elements only in coll2, or
  3. elements in both collections
Tuples are otherwise filled with null.

Returns:
a collection of Tuple3 objects


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.