This project has retired. For details please refer to its Attic page.
PTable (Apache Crunch 0.11.0 API)

org.apache.crunch
Interface PTable<K,V>

All Superinterfaces:
PCollection<Pair<K,V>>
All Known Implementing Classes:
BaseDoTable, BaseInputTable, BaseUnionTable, DoTable, EmptyPTable, EmptyPTable, InputTable, PTableBase, UnionTable

public interface PTable<K,V>
extends PCollection<Pair<K,V>>

A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.


Method Summary
 PObject<Map<K,V>> asMap()
          Returns a PObject encapsulating a Map made up of the keys and values in this PTable.
 PTable<K,V> bottom(int count)
          Returns a PTable made up of the pairs in this PTable with the smallest value field.
 PTable<K,V> cache()
          Marks this data as cached using the default CachingOptions.
 PTable<K,V> cache(CachingOptions options)
          Marks this data as cached using the given CachingOptions.
<U> PTable<K,Pair<Collection<V>,Collection<U>>>
cogroup(PTable<K,U> other)
          Co-group operation with the given table on common keys.
 PTable<K,Collection<V>> collectValues()
          Aggregate all of the values with the same key into a single key-value pair in the returned PTable.
 PTable<K,V> filter(FilterFn<Pair<K,V>> filterFn)
          Apply the given filter function to this instance and return the resulting PTable.
 PTable<K,V> filter(String name, FilterFn<Pair<K,V>> filterFn)
          Apply the given filter function to this instance and return the resulting PTable.
 PType<K> getKeyType()
          Returns the PType of the key.
 PTableType<K,V> getPTableType()
          Returns the PTableType of this PTable.
 PType<V> getValueType()
          Returns the PType of the value.
 PGroupedTable<K,V> groupByKey()
          Performs a grouping operation on the keys of this table.
 PGroupedTable<K,V> groupByKey(GroupingOptions options)
          Performs a grouping operation on the keys of this table, using the additional GroupingOptions to control how the grouping is executed.
 PGroupedTable<K,V> groupByKey(int numPartitions)
          Performs a grouping operation on the keys of this table, using the given number of partitions.
<U> PTable<K,Pair<V,U>>
join(PTable<K,U> other)
          Perform an inner join on this table and the one passed in as an argument on their common keys.
 PCollection<K> keys()
          Returns a PCollection made up of the keys in this PTable.
<K2> PTable<K2,V>
mapKeys(MapFn<K,K2> mapFn, PType<K2> ptype)
          Returns a PTable that has the same values as this instance, but uses the given function to map the keys.
<K2> PTable<K2,V>
mapKeys(String name, MapFn<K,K2> mapFn, PType<K2> ptype)
          Returns a PTable that has the same values as this instance, but uses the given function to map the keys.
<U> PTable<K,U>
mapValues(MapFn<V,U> mapFn, PType<U> ptype)
          Returns a PTable that has the same keys as this instance, but uses the given function to map the values.
<U> PTable<K,U>
mapValues(String name, MapFn<V,U> mapFn, PType<U> ptype)
          Returns a PTable that has the same keys as this instance, but uses the given function to map the values.
 Map<K,V> materializeToMap()
          Returns a Map made up of the keys and values in this PTable.
 PTable<K,V> top(int count)
          Returns a PTable made up of the pairs in this PTable with the largest value field.
 PTable<K,V> union(PTable<K,V>... others)
          Returns a PTable instance that acts as the union of this PTable and the input PTables.
 PTable<K,V> union(PTable<K,V> other)
          Returns a PTable instance that acts as the union of this PTable and the other PTables.
 PCollection<V> values()
          Returns a PCollection made up of the values in this PTable.
 PTable<K,V> write(Target target)
          Writes this PTable to the given Target.
 PTable<K,V> write(Target target, Target.WriteMode writeMode)
          Writes this PTable to the given Target, using the given Target.WriteMode to handle existing targets.
 
Methods inherited from interface org.apache.crunch.PCollection
aggregate, asCollection, asReadable, by, by, count, first, getName, getPipeline, getPType, getSize, getTypeFamily, length, materialize, max, min, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, sequentialDo, union, union
 

Method Detail

union

PTable<K,V> union(PTable<K,V> other)
Returns a PTable instance that acts as the union of this PTable and the other PTables.


union

PTable<K,V> union(PTable<K,V>... others)
Returns a PTable instance that acts as the union of this PTable and the input PTables.


groupByKey

PGroupedTable<K,V> groupByKey()
Performs a grouping operation on the keys of this table.

Returns:
a PGroupedTable instance that represents the grouping

groupByKey

PGroupedTable<K,V> groupByKey(int numPartitions)
Performs a grouping operation on the keys of this table, using the given number of partitions.

Parameters:
numPartitions - The number of partitions for the data.
Returns:
a PGroupedTable instance that represents this grouping

groupByKey

PGroupedTable<K,V> groupByKey(GroupingOptions options)
Performs a grouping operation on the keys of this table, using the additional GroupingOptions to control how the grouping is executed.

Parameters:
options - The grouping options to use
Returns:
a PGroupedTable instance that represents the grouping

write

PTable<K,V> write(Target target)
Writes this PTable to the given Target.

Specified by:
write in interface PCollection<Pair<K,V>>
Parameters:
target - The target to write to

write

PTable<K,V> write(Target target,
                  Target.WriteMode writeMode)
Writes this PTable to the given Target, using the given Target.WriteMode to handle existing targets.

Specified by:
write in interface PCollection<Pair<K,V>>
Parameters:
target - The target
writeMode - The rule for handling existing outputs at the target location

cache

PTable<K,V> cache()
Description copied from interface: PCollection
Marks this data as cached using the default CachingOptions. Cached PCollections will only be processed once, and then their contents will be saved so that downstream code can process them many times.

Specified by:
cache in interface PCollection<Pair<K,V>>
Returns:
this PCollection instance

cache

PTable<K,V> cache(CachingOptions options)
Description copied from interface: PCollection
Marks this data as cached using the given CachingOptions. Cached PCollections will only be processed once and then their contents will be saved so that downstream code can process them many times.

Specified by:
cache in interface PCollection<Pair<K,V>>
Parameters:
options - the options that control the cache settings for the data
Returns:
this PCollection instance

getPTableType

PTableType<K,V> getPTableType()
Returns the PTableType of this PTable.


getKeyType

PType<K> getKeyType()
Returns the PType of the key.


getValueType

PType<V> getValueType()
Returns the PType of the value.


mapValues

<U> PTable<K,U> mapValues(MapFn<V,U> mapFn,
                          PType<U> ptype)
Returns a PTable that has the same keys as this instance, but uses the given function to map the values.


mapValues

<U> PTable<K,U> mapValues(String name,
                          MapFn<V,U> mapFn,
                          PType<U> ptype)
Returns a PTable that has the same keys as this instance, but uses the given function to map the values.


mapKeys

<K2> PTable<K2,V> mapKeys(MapFn<K,K2> mapFn,
                          PType<K2> ptype)
Returns a PTable that has the same values as this instance, but uses the given function to map the keys.


mapKeys

<K2> PTable<K2,V> mapKeys(String name,
                          MapFn<K,K2> mapFn,
                          PType<K2> ptype)
Returns a PTable that has the same values as this instance, but uses the given function to map the keys.


collectValues

PTable<K,Collection<V>> collectValues()
Aggregate all of the values with the same key into a single key-value pair in the returned PTable.


filter

PTable<K,V> filter(FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting PTable.

Specified by:
filter in interface PCollection<Pair<K,V>>

filter

PTable<K,V> filter(String name,
                   FilterFn<Pair<K,V>> filterFn)
Apply the given filter function to this instance and return the resulting PTable.

Specified by:
filter in interface PCollection<Pair<K,V>>
Parameters:
name - An identifier for this processing step
filterFn - The FilterFn to apply

top

PTable<K,V> top(int count)
Returns a PTable made up of the pairs in this PTable with the largest value field.

Parameters:
count - The number of pairs to return

bottom

PTable<K,V> bottom(int count)
Returns a PTable made up of the pairs in this PTable with the smallest value field.

Parameters:
count - The number of pairs to return

join

<U> PTable<K,Pair<V,U>> join(PTable<K,U> other)
Perform an inner join on this table and the one passed in as an argument on their common keys.


cogroup

<U> PTable<K,Pair<Collection<V>,Collection<U>>> cogroup(PTable<K,U> other)
Co-group operation with the given table on common keys.


keys

PCollection<K> keys()
Returns a PCollection made up of the keys in this PTable.


values

PCollection<V> values()
Returns a PCollection made up of the values in this PTable.


materializeToMap

Map<K,V> materializeToMap()
Returns a Map made up of the keys and values in this PTable.

Note: The contents of the returned map may not be exactly the same as this PTable, as a PTable is a multi-map (i.e. can contain multiple values for a single key).


asMap

PObject<Map<K,V>> asMap()
Returns a PObject encapsulating a Map made up of the keys and values in this PTable.

Note:The contents of the returned map may not be exactly the same as this PTable, as a PTable is a multi-map (i.e. can contain multiple values for a single key).

Returns:
The PObject encapsulating a Map made up of the keys and values in this PTable.


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.