Project Crunch has retired. For details please refer to its Attic page.
Distinct (Apache Crunch 0.10.0 API)

org.apache.crunch.lib
Class Distinct

java.lang.Object
  extended by org.apache.crunch.lib.Distinct

public final class Distinct
extends Object

Functions for computing the distinct elements of a PCollection.


Method Summary
static
<S> PCollection<S>
distinct(PCollection<S> input)
          Construct a new PCollection that contains the unique elements of a given input PCollection.
static
<S> PCollection<S>
distinct(PCollection<S> input, int flushEvery)
          A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.
static
<K,V> PTable<K,V>
distinct(PTable<K,V> input)
          A PTable<K, V> analogue of the distinct function.
static
<K,V> PTable<K,V>
distinct(PTable<K,V> input, int flushEvery)
          A PTable<K, V> analogue of the distinct function.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

distinct

public static <S> PCollection<S> distinct(PCollection<S> input)
Construct a new PCollection that contains the unique elements of a given input PCollection.

Parameters:
input - The input PCollection
Returns:
A new PCollection that contains the unique elements of the input

distinct

public static <K,V> PTable<K,V> distinct(PTable<K,V> input)
A PTable<K, V> analogue of the distinct function.


distinct

public static <S> PCollection<S> distinct(PCollection<S> input,
                                          int flushEvery)
A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.

Parameters:
input - The input PCollection
flushEvery - Flush the elements to disk whenever we encounter this many unique values
Returns:
A new PCollection that contains the unique elements of the input

distinct

public static <K,V> PTable<K,V> distinct(PTable<K,V> input,
                                         int flushEvery)
A PTable<K, V> analogue of the distinct function.



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.