This project has retired. For details please refer to its Attic page.
BaseGroupedTable (Apache Crunch 0.9.0 API)

org.apache.crunch.impl.dist.collect
Class BaseGroupedTable<K,V>

java.lang.Object
  extended by org.apache.crunch.impl.dist.collect.PCollectionImpl<Pair<K,Iterable<V>>>
      extended by org.apache.crunch.impl.dist.collect.BaseGroupedTable<K,V>
All Implemented Interfaces:
PCollection<Pair<K,Iterable<V>>>, PGroupedTable<K,V>
Direct Known Subclasses:
PGroupedTableImpl, PGroupedTableImpl

public class BaseGroupedTable<K,V>
extends PCollectionImpl<Pair<K,Iterable<V>>>
implements PGroupedTable<K,V>


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.crunch.impl.dist.collect.PCollectionImpl
PCollectionImpl.Visitor
 
Field Summary
protected  GroupingOptions groupingOptions
           
protected  PTableBase<K,V> parent
           
protected  PGroupedTableType<K,V> ptype
           
 
Fields inherited from class org.apache.crunch.impl.dist.collect.PCollectionImpl
doOptions, materializedAt, pipeline
 
Constructor Summary
protected BaseGroupedTable(PTableBase<K,V> parent)
           
protected BaseGroupedTable(PTableBase<K,V> parent, GroupingOptions groupingOptions)
           
 
Method Summary
protected  void acceptInternal(PCollectionImpl.Visitor visitor)
           
 PTable<K,V> combineValues(Aggregator<V> agg)
          Combine the values in each group using the given Aggregator.
 PTable<K,V> combineValues(Aggregator<V> combineAgg, Aggregator<V> reduceAgg)
          Combine and reduces the values in each group using the given Aggregator instances.
 PTable<K,V> combineValues(CombineFn<K,V> combineFn)
          Combines the values of this grouping using the given CombineFn.
 PTable<K,V> combineValues(CombineFn<K,V> combineFn, CombineFn<K,V> reduceFn)
          Combines and reduces the values of this grouping using the given CombineFn instances.
protected  PCollectionImpl<Pair<K,Iterable<V>>> getChainingCollection()
          Retrieve the PCollectionImpl to be used for chaining within PCollectionImpls further down the pipeline.
 PGroupedTableType<K,V> getGroupedTableType()
          Return the PGroupedTableType containing serialization information for this PGroupedTable.
 long getLastModifiedAt()
           
 List<PCollectionImpl<?>> getParents()
           
 PType<Pair<K,Iterable<V>>> getPType()
          Returns the PType of this PCollection.
protected  ReadableData<Pair<K,Iterable<V>>> getReadableDataInternal()
           
protected  long getSizeInternal()
           
 Set<SourceTarget<?>> getTargetDependencies()
           
<U> PTable<K,U>
mapValues(MapFn<Iterable<V>,U> mapFn, PType<U> ptype)
          Maps the Iterable<V> elements of each record to a new type.
<U> PTable<K,U>
mapValues(String name, MapFn<Iterable<V>,U> mapFn, PType<U> ptype)
          Maps the Iterable<V> elements of each record to a new type.
 PTable<K,V> ungroup()
          Convert this grouping back into a multimap.
 
Methods inherited from class org.apache.crunch.impl.dist.collect.PCollectionImpl
accept, asCollection, asReadable, by, by, cache, cache, count, filter, filter, getDepth, getMaterializedAt, getName, getOnlyParent, getParallelDoOptions, getPipeline, getSize, getTypeFamily, isBreakpoint, length, materialize, materializeAt, materializedData, max, min, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, setBreakpoint, toString, union, union, write, write
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.crunch.PCollection
asCollection, asReadable, by, by, cache, cache, count, filter, filter, getName, getPipeline, getSize, getTypeFamily, length, materialize, max, min, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, union, union, write, write
 

Field Detail

parent

protected final PTableBase<K,V> parent

groupingOptions

protected final GroupingOptions groupingOptions

ptype

protected final PGroupedTableType<K,V> ptype
Constructor Detail

BaseGroupedTable

protected BaseGroupedTable(PTableBase<K,V> parent)

BaseGroupedTable

protected BaseGroupedTable(PTableBase<K,V> parent,
                           GroupingOptions groupingOptions)
Method Detail

getReadableDataInternal

protected ReadableData<Pair<K,Iterable<V>>> getReadableDataInternal()
Specified by:
getReadableDataInternal in class PCollectionImpl<Pair<K,Iterable<V>>>

getSizeInternal

protected long getSizeInternal()
Specified by:
getSizeInternal in class PCollectionImpl<Pair<K,Iterable<V>>>

getPType

public PType<Pair<K,Iterable<V>>> getPType()
Description copied from interface: PCollection
Returns the PType of this PCollection.

Specified by:
getPType in interface PCollection<Pair<K,Iterable<V>>>

combineValues

public PTable<K,V> combineValues(CombineFn<K,V> combineFn,
                                 CombineFn<K,V> reduceFn)
Description copied from interface: PGroupedTable
Combines and reduces the values of this grouping using the given CombineFn instances.

Specified by:
combineValues in interface PGroupedTable<K,V>
Parameters:
combineFn - The combiner function during the combine phase
reduceFn - The combiner function during the reduce phase
Returns:
A PTable where each key has a single value

combineValues

public PTable<K,V> combineValues(CombineFn<K,V> combineFn)
Description copied from interface: PGroupedTable
Combines the values of this grouping using the given CombineFn.

Specified by:
combineValues in interface PGroupedTable<K,V>
Parameters:
combineFn - The combiner function
Returns:
A PTable where each key has a single value

combineValues

public PTable<K,V> combineValues(Aggregator<V> agg)
Description copied from interface: PGroupedTable
Combine the values in each group using the given Aggregator.

Specified by:
combineValues in interface PGroupedTable<K,V>
Parameters:
agg - The function to use
Returns:
A PTable where each group key maps to an aggregated value. Group keys may be repeated if an aggregator returns more than one value.

combineValues

public PTable<K,V> combineValues(Aggregator<V> combineAgg,
                                 Aggregator<V> reduceAgg)
Description copied from interface: PGroupedTable
Combine and reduces the values in each group using the given Aggregator instances.

Specified by:
combineValues in interface PGroupedTable<K,V>
Parameters:
combineAgg - The aggregator to use during the combine phase
reduceAgg - The aggregator to use during the reduce phase
Returns:
A PTable where each group key maps to an aggregated value. Group keys may be repeated if an aggregator returns more than one value.

ungroup

public PTable<K,V> ungroup()
Description copied from interface: PGroupedTable
Convert this grouping back into a multimap.

Specified by:
ungroup in interface PGroupedTable<K,V>
Returns:
an ungrouped version of the data in this PGroupedTable.

mapValues

public <U> PTable<K,U> mapValues(MapFn<Iterable<V>,U> mapFn,
                                 PType<U> ptype)
Description copied from interface: PGroupedTable
Maps the Iterable<V> elements of each record to a new type. Just like any parallelDo operation on a PGroupedTable, this may only be called once.

Specified by:
mapValues in interface PGroupedTable<K,V>
Parameters:
mapFn - The mapping function
ptype - The serialization information for the returned data
Returns:
A new PTable instance

mapValues

public <U> PTable<K,U> mapValues(String name,
                                 MapFn<Iterable<V>,U> mapFn,
                                 PType<U> ptype)
Description copied from interface: PGroupedTable
Maps the Iterable<V> elements of each record to a new type. Just like any parallelDo operation on a PGroupedTable, this may only be called once.

Specified by:
mapValues in interface PGroupedTable<K,V>
Parameters:
name - A name for this operation
mapFn - The mapping function
ptype - The serialization information for the returned data
Returns:
A new PTable instance

getGroupedTableType

public PGroupedTableType<K,V> getGroupedTableType()
Description copied from interface: PGroupedTable
Return the PGroupedTableType containing serialization information for this PGroupedTable.

Specified by:
getGroupedTableType in interface PGroupedTable<K,V>

getTargetDependencies

public Set<SourceTarget<?>> getTargetDependencies()
Overrides:
getTargetDependencies in class PCollectionImpl<Pair<K,Iterable<V>>>

getParents

public List<PCollectionImpl<?>> getParents()
Specified by:
getParents in class PCollectionImpl<Pair<K,Iterable<V>>>

getLastModifiedAt

public long getLastModifiedAt()
Specified by:
getLastModifiedAt in class PCollectionImpl<Pair<K,Iterable<V>>>

acceptInternal

protected void acceptInternal(PCollectionImpl.Visitor visitor)
Specified by:
acceptInternal in class PCollectionImpl<Pair<K,Iterable<V>>>

getChainingCollection

protected PCollectionImpl<Pair<K,Iterable<V>>> getChainingCollection()
Description copied from class: PCollectionImpl
Retrieve the PCollectionImpl to be used for chaining within PCollectionImpls further down the pipeline.

Overrides:
getChainingCollection in class PCollectionImpl<Pair<K,Iterable<V>>>
Returns:
The PCollectionImpl instance to be chained


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.