This project has retired. For details please refer to its Attic page.
SecondarySort (Apache Crunch 0.9.0 API)

org.apache.crunch.lib
Class SecondarySort

java.lang.Object
  extended by org.apache.crunch.lib.SecondarySort

public class SecondarySort
extends Object

Utilities for performing a secondary sort on a PTable<K, Pair<V1, V2>> collection.

Secondary sorts are usually performed during sessionization: given a collection of events, we want to group them by a key (such as a user ID), then sort the grouped records by an auxillary key (such as a timestamp), and then perform some additional processing on the sorted records.


Constructor Summary
SecondarySort()
           
 
Method Summary
static
<K,V1,V2,U,V>
PTable<U,V>
sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn, PTableType<U,V> ptype)
          Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PTable<U, V>.
static
<K,V1,V2,U,V>
PTable<U,V>
sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn, PTableType<U,V> ptype, int numReducers)
          Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PTable<U, V>, using the given number of reducers.
static
<K,V1,V2,T>
PCollection<T>
sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype)
          Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>.
static
<K,V1,V2,T>
PCollection<T>
sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype, int numReducers)
          Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>, using the given number of reducers.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SecondarySort

public SecondarySort()
Method Detail

sortAndApply

public static <K,V1,V2,T> PCollection<T> sortAndApply(PTable<K,Pair<V1,V2>> input,
                                                      DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
                                                      PType<T> ptype)
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>.


sortAndApply

public static <K,V1,V2,T> PCollection<T> sortAndApply(PTable<K,Pair<V1,V2>> input,
                                                      DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
                                                      PType<T> ptype,
                                                      int numReducers)
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>, using the given number of reducers.


sortAndApply

public static <K,V1,V2,U,V> PTable<U,V> sortAndApply(PTable<K,Pair<V1,V2>> input,
                                                     DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
                                                     PTableType<U,V> ptype)
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PTable<U, V>.


sortAndApply

public static <K,V1,V2,U,V> PTable<U,V> sortAndApply(PTable<K,Pair<V1,V2>> input,
                                                     DoFn<Pair<K,Iterable<Pair<V1,V2>>>,Pair<U,V>> doFn,
                                                     PTableType<U,V> ptype,
                                                     int numReducers)
Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PTable<U, V>, using the given number of reducers.



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.