This project has retired. For details please refer to its Attic page.
MemTable (Apache Crunch 0.9.0 API)

org.apache.crunch.impl.mem.collect
Class MemTable<K,V>

java.lang.Object
  extended by org.apache.crunch.impl.mem.collect.MemCollection<Pair<K,V>>
      extended by org.apache.crunch.impl.mem.collect.MemTable<K,V>
All Implemented Interfaces:
PCollection<Pair<K,V>>, PTable<K,V>

public class MemTable<K,V>
extends MemCollection<Pair<K,V>>
implements PTable<K,V>


Constructor Summary
MemTable(Iterable<Pair<K,V>> collect)
           
MemTable(Iterable<Pair<K,V>> collect, PTableType<K,V> ptype, String name)
           
 
Method Summary
 PObject<Map<K,V>> asMap()
          Returns a PObject encapsulating a Map made up of the keys and values in this PTable.
 PTable<K,V> bottom(int count)
          Returns a PTable made up of the pairs in this PTable with the smallest value field.
 PTable<K,V> cache()
          Marks this data as cached using the default CachingOptions.
 PTable<K,V> cache(CachingOptions options)
          Marks this data as cached using the given CachingOptions.
<U> PTable<K,Pair<Collection<V>,Collection<U>>>
cogroup(PTable<K,U> other)
          Co-group operation with the given table on common keys.
 PTable<K,Collection<V>> collectValues()
          Aggregate all of the values with the same key into a single key-value pair in the returned PTable.
 PTable<K,V> filter(FilterFn<Pair<K,V>> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PTable<K,V> filter(String name, FilterFn<Pair<K,V>> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PType<K> getKeyType()
          Returns the PType of the key.
 PTableType<K,V> getPTableType()
          Returns the PTableType of this PTable.
 PType<V> getValueType()
          Returns the PType of the value.
 PGroupedTable<K,V> groupByKey()
          Performs a grouping operation on the keys of this table.
 PGroupedTable<K,V> groupByKey(GroupingOptions options)
          Performs a grouping operation on the keys of this table, using the additional GroupingOptions to control how the grouping is executed.
 PGroupedTable<K,V> groupByKey(int numPartitions)
          Performs a grouping operation on the keys of this table, using the given number of partitions.
<U> PTable<K,Pair<V,U>>
join(PTable<K,U> other)
          Perform an inner join on this table and the one passed in as an argument on their common keys.
 PCollection<K> keys()
          Returns a PCollection made up of the keys in this PTable.
<K2> PTable<K2,V>
mapKeys(MapFn<K,K2> mapFn, PType<K2> ptype)
          Returns a PTable that has the same values as this instance, but uses the given function to map the keys.
<K2> PTable<K2,V>
mapKeys(String name, MapFn<K,K2> mapFn, PType<K2> ptype)
          Returns a PTable that has the same values as this instance, but uses the given function to map the keys.
<U> PTable<K,U>
mapValues(MapFn<V,U> mapFn, PType<U> ptype)
          Returns a PTable that has the same keys as this instance, but uses the given function to map the values.
<U> PTable<K,U>
mapValues(String name, MapFn<V,U> mapFn, PType<U> ptype)
          Returns a PTable that has the same keys as this instance, but uses the given function to map the values.
 Map<K,V> materializeToMap()
          Returns a Map made up of the keys and values in this PTable.
 PTable<K,V> top(int count)
          Returns a PTable made up of the pairs in this PTable with the largest value field.
 PTable<K,V> union(PTable<K,V>... others)
          Returns a PTable instance that acts as the union of this PTable and the input PTables.
 PTable<K,V> union(PTable<K,V> other)
          Returns a PTable instance that acts as the union of this PTable and the other PTables.
 PCollection<V> values()
          Returns a PCollection made up of the values in this PTable.
 PTable<K,V> write(Target target)
          Write the contents of this PCollection to the given Target, using the storage format specified by the target.
 PTable<K,V> write(Target target, Target.WriteMode writeMode)
          Write the contents of this PCollection to the given Target, using the given Target.WriteMode to handle existing targets.
 
Methods inherited from class org.apache.crunch.impl.mem.collect.MemCollection
asCollection, asReadable, by, by, count, getCollection, getName, getPipeline, getPType, getSize, getTypeFamily, length, materialize, max, min, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, toString, union, union
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.crunch.PCollection
asCollection, asReadable, by, by, count, getName, getPipeline, getPType, getSize, getTypeFamily, length, materialize, max, min, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, parallelDo, union, union
 

Constructor Detail

MemTable

public MemTable(Iterable<Pair<K,V>> collect)

MemTable

public MemTable(Iterable<Pair<K,V>> collect,
                PTableType<K,V> ptype,
                String name)
Method Detail

union

public PTable<K,V> union(PTable<K,V> other)
Description copied from interface: PTable
Returns a PTable instance that acts as the union of this PTable and the other PTables.

Specified by:
union in interface PTable<K,V>

union

public PTable<K,V> union(PTable<K,V>... others)
Description copied from interface: PTable
Returns a PTable instance that acts as the union of this PTable and the input PTables.

Specified by:
union in interface PTable<K,V>

groupByKey

public PGroupedTable<K,V> groupByKey()
Description copied from interface: PTable
Performs a grouping operation on the keys of this table.

Specified by:
groupByKey in interface PTable<K,V>
Returns:
a PGroupedTable instance that represents the grouping

groupByKey

public PGroupedTable<K,V> groupByKey(int numPartitions)
Description copied from interface: PTable
Performs a grouping operation on the keys of this table, using the given number of partitions.

Specified by:
groupByKey in interface PTable<K,V>
Parameters:
numPartitions - The number of partitions for the data.
Returns:
a PGroupedTable instance that represents this grouping

groupByKey

public PGroupedTable<K,V> groupByKey(GroupingOptions options)
Description copied from interface: PTable
Performs a grouping operation on the keys of this table, using the additional GroupingOptions to control how the grouping is executed.

Specified by:
groupByKey in interface PTable<K,V>
Parameters:
options - The grouping options to use
Returns:
a PGroupedTable instance that represents the grouping

write

public PTable<K,V> write(Target target)
Description copied from interface: PCollection
Write the contents of this PCollection to the given Target, using the storage format specified by the target.

Specified by:
write in interface PCollection<Pair<K,V>>
Specified by:
write in interface PTable<K,V>
Overrides:
write in class MemCollection<Pair<K,V>>
Parameters:
target - The target to write to

write

public PTable<K,V> write(Target target,
                         Target.WriteMode writeMode)
Description copied from interface: PCollection
Write the contents of this PCollection to the given Target, using the given Target.WriteMode to handle existing targets.

Specified by:
write in interface PCollection<Pair<K,V>>
Specified by:
write in interface PTable<K,V>
Overrides:
write in class MemCollection<Pair<K,V>>
Parameters:
target - The target
writeMode - The rule for handling existing outputs at the target location

cache

public PTable<K,V> cache()
Description copied from interface: PCollection
Marks this data as cached using the default CachingOptions. Cached PCollections will only be processed once, and then their contents will be saved so that downstream code can process them many times.

Specified by:
cache in interface PCollection<Pair<K,V>>
Specified by:
cache in interface PTable<K,V>
Overrides:
cache in class MemCollection<Pair<K,V>>
Returns:
this PCollection instance

cache

public PTable<K,V> cache(CachingOptions options)
Description copied from interface: PCollection
Marks this data as cached using the given CachingOptions. Cached PCollections will only be processed once and then their contents will be saved so that downstream code can process them many times.

Specified by:
cache in interface PCollection<Pair<K,V>>
Specified by:
cache in interface PTable<K,V>
Overrides:
cache in class MemCollection<Pair<K,V>>
Parameters:
options - the options that control the cache settings for the data
Returns:
this PCollection instance

getPTableType

public PTableType<K,V> getPTableType()
Description copied from interface: PTable
Returns the PTableType of this PTable.

Specified by:
getPTableType in interface PTable<K,V>

getKeyType

public PType<K> getKeyType()
Description copied from interface: PTable
Returns the PType of the key.

Specified by:
getKeyType in interface PTable<K,V>

getValueType

public PType<V> getValueType()
Description copied from interface: PTable
Returns the PType of the value.

Specified by:
getValueType in interface PTable<K,V>

filter

public PTable<K,V> filter(FilterFn<Pair<K,V>> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<Pair<K,V>>
Specified by:
filter in interface PTable<K,V>
Overrides:
filter in class MemCollection<Pair<K,V>>

filter

public PTable<K,V> filter(String name,
                          FilterFn<Pair<K,V>> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<Pair<K,V>>
Specified by:
filter in interface PTable<K,V>
Overrides:
filter in class MemCollection<Pair<K,V>>
Parameters:
name - An identifier for this processing step
filterFn - The FilterFn to apply

mapValues

public <U> PTable<K,U> mapValues(MapFn<V,U> mapFn,
                                 PType<U> ptype)
Description copied from interface: PTable
Returns a PTable that has the same keys as this instance, but uses the given function to map the values.

Specified by:
mapValues in interface PTable<K,V>

mapValues

public <U> PTable<K,U> mapValues(String name,
                                 MapFn<V,U> mapFn,
                                 PType<U> ptype)
Description copied from interface: PTable
Returns a PTable that has the same keys as this instance, but uses the given function to map the values.

Specified by:
mapValues in interface PTable<K,V>

mapKeys

public <K2> PTable<K2,V> mapKeys(MapFn<K,K2> mapFn,
                                 PType<K2> ptype)
Description copied from interface: PTable
Returns a PTable that has the same values as this instance, but uses the given function to map the keys.

Specified by:
mapKeys in interface PTable<K,V>

mapKeys

public <K2> PTable<K2,V> mapKeys(String name,
                                 MapFn<K,K2> mapFn,
                                 PType<K2> ptype)
Description copied from interface: PTable
Returns a PTable that has the same values as this instance, but uses the given function to map the keys.

Specified by:
mapKeys in interface PTable<K,V>

top

public PTable<K,V> top(int count)
Description copied from interface: PTable
Returns a PTable made up of the pairs in this PTable with the largest value field.

Specified by:
top in interface PTable<K,V>
Parameters:
count - The number of pairs to return

bottom

public PTable<K,V> bottom(int count)
Description copied from interface: PTable
Returns a PTable made up of the pairs in this PTable with the smallest value field.

Specified by:
bottom in interface PTable<K,V>
Parameters:
count - The number of pairs to return

collectValues

public PTable<K,Collection<V>> collectValues()
Description copied from interface: PTable
Aggregate all of the values with the same key into a single key-value pair in the returned PTable.

Specified by:
collectValues in interface PTable<K,V>

join

public <U> PTable<K,Pair<V,U>> join(PTable<K,U> other)
Description copied from interface: PTable
Perform an inner join on this table and the one passed in as an argument on their common keys.

Specified by:
join in interface PTable<K,V>

cogroup

public <U> PTable<K,Pair<Collection<V>,Collection<U>>> cogroup(PTable<K,U> other)
Description copied from interface: PTable
Co-group operation with the given table on common keys.

Specified by:
cogroup in interface PTable<K,V>

keys

public PCollection<K> keys()
Description copied from interface: PTable
Returns a PCollection made up of the keys in this PTable.

Specified by:
keys in interface PTable<K,V>

values

public PCollection<V> values()
Description copied from interface: PTable
Returns a PCollection made up of the values in this PTable.

Specified by:
values in interface PTable<K,V>

materializeToMap

public Map<K,V> materializeToMap()
Description copied from interface: PTable
Returns a Map made up of the keys and values in this PTable.

Note: The contents of the returned map may not be exactly the same as this PTable, as a PTable is a multi-map (i.e. can contain multiple values for a single key).

Specified by:
materializeToMap in interface PTable<K,V>

asMap

public PObject<Map<K,V>> asMap()
Description copied from interface: PTable
Returns a PObject encapsulating a Map made up of the keys and values in this PTable.

Note:The contents of the returned map may not be exactly the same as this PTable, as a PTable is a multi-map (i.e. can contain multiple values for a single key).

Specified by:
asMap in interface PTable<K,V>
Returns:
The PObject encapsulating a Map made up of the keys and values in this PTable.


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.