This project has retired. For details please refer to its Attic page.
PGroupedTable (Apache Crunch 0.10.0 API)

org.apache.crunch
Interface PGroupedTable<K,V>

All Superinterfaces:
PCollection<Pair<K,Iterable<V>>>
All Known Implementing Classes:
BaseGroupedTable, PGroupedTableImpl

public interface PGroupedTable<K,V>
extends PCollection<Pair<K,Iterable<V>>>

The Crunch representation of a grouped PTable, which corresponds to the output of the shuffle phase of a MapReduce job.


Method Summary
 PTable<K,V> combineValues(Aggregator<V> aggregator)
          Combine the values in each group using the given Aggregator.
 PTable<K,V> combineValues(Aggregator<V> combineAggregator, Aggregator<V> reduceAggregator)
          Combine and reduces the values in each group using the given Aggregator instances.
 PTable<K,V> combineValues(CombineFn<K,V> combineFn)
          Combines the values of this grouping using the given CombineFn.
 PTable<K,V> combineValues(CombineFn<K,V> combineFn, CombineFn<K,V> reduceFn)
          Combines and reduces the values of this grouping using the given CombineFn instances.
 PGroupedTableType<K,V> getGroupedTableType()
          Return the PGroupedTableType containing serialization information for this PGroupedTable.
<U> PTable<K,U>
mapValues(MapFn<Iterable<V>,U> mapFn, PType<U> ptype)
          Maps the Iterable<V> elements of each record to a new type.
<U> PTable<K,U>
mapValues(String name, MapFn<Iterable<V>,U> mapFn, PType<U> ptype)
          Maps the Iterable<V> elements of each record to a new type.
 PTable<K,V> ungroup()
          Convert this grouping back into a multimap.
 
Methods inherited from interface org.apache.crunch.PCollection
aggregate, asCollection, asReadable, by, by, cache, cache, count, filter, filter, first, getName, getPipeline, getPType, getSize, getTypeFamily, length, materialize, max, min, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, union, union, write, write
 

Method Detail

combineValues

PTable<K,V> combineValues(CombineFn<K,V> combineFn)
Combines the values of this grouping using the given CombineFn.

Parameters:
combineFn - The combiner function
Returns:
A PTable where each key has a single value

combineValues

PTable<K,V> combineValues(CombineFn<K,V> combineFn,
                          CombineFn<K,V> reduceFn)
Combines and reduces the values of this grouping using the given CombineFn instances.

Parameters:
combineFn - The combiner function during the combine phase
reduceFn - The combiner function during the reduce phase
Returns:
A PTable where each key has a single value

combineValues

PTable<K,V> combineValues(Aggregator<V> aggregator)
Combine the values in each group using the given Aggregator.

Parameters:
aggregator - The function to use
Returns:
A PTable where each group key maps to an aggregated value. Group keys may be repeated if an aggregator returns more than one value.

combineValues

PTable<K,V> combineValues(Aggregator<V> combineAggregator,
                          Aggregator<V> reduceAggregator)
Combine and reduces the values in each group using the given Aggregator instances.

Parameters:
combineAggregator - The aggregator to use during the combine phase
reduceAggregator - The aggregator to use during the reduce phase
Returns:
A PTable where each group key maps to an aggregated value. Group keys may be repeated if an aggregator returns more than one value.

mapValues

<U> PTable<K,U> mapValues(MapFn<Iterable<V>,U> mapFn,
                          PType<U> ptype)
Maps the Iterable<V> elements of each record to a new type. Just like any parallelDo operation on a PGroupedTable, this may only be called once.

Parameters:
mapFn - The mapping function
ptype - The serialization information for the returned data
Returns:
A new PTable instance

mapValues

<U> PTable<K,U> mapValues(String name,
                          MapFn<Iterable<V>,U> mapFn,
                          PType<U> ptype)
Maps the Iterable<V> elements of each record to a new type. Just like any parallelDo operation on a PGroupedTable, this may only be called once.

Parameters:
name - A name for this operation
mapFn - The mapping function
ptype - The serialization information for the returned data
Returns:
A new PTable instance

ungroup

PTable<K,V> ungroup()
Convert this grouping back into a multimap.

Returns:
an ungrouped version of the data in this PGroupedTable.

getGroupedTableType

PGroupedTableType<K,V> getGroupedTableType()
Return the PGroupedTableType containing serialization information for this PGroupedTable.



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.