This project has retired. For details please refer to its Attic page.
MemCollection (Apache Crunch 0.3.0-incubating API)

org.apache.crunch.impl.mem.collect
Class MemCollection<S>

java.lang.Object
  extended by org.apache.crunch.impl.mem.collect.MemCollection<S>
All Implemented Interfaces:
PCollection<S>
Direct Known Subclasses:
MemTable

public class MemCollection<S>
extends Object
implements PCollection<S>


Constructor Summary
MemCollection(Iterable<S> collect)
           
MemCollection(Iterable<S> collect, PType<S> ptype)
           
MemCollection(Iterable<S> collect, PType<S> ptype, String name)
           
 
Method Summary
<K> PTable<K,S>
by(MapFn<S,K> mapFn, PType<K> keyType)
          Apply the given map function to each element of this instance in order to create a PTable.
<K> PTable<K,S>
by(String name, MapFn<S,K> mapFn, PType<K> keyType)
          Apply the given map function to each element of this instance in order to create a PTable.
 PTable<S,Long> count()
          Returns a PTable instance that contains the counts of each unique element of this PCollection.
 PCollection<S> filter(FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PCollection<S> filter(String name, FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 Collection<S> getCollection()
           
 String getName()
          Returns a shorthand name for this PCollection.
 Pipeline getPipeline()
          Returns the Pipeline associated with this PCollection.
 PType<S> getPType()
          Returns the PType of this PCollection.
 long getSize()
          Returns the size of the data represented by this PCollection in bytes.
 PTypeFamily getTypeFamily()
          Returns the PTypeFamily of this PCollection.
 Iterable<S> materialize()
          Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.
 PCollection<S> max()
          Returns a PCollection made up of only the maximum element of this instance.
 PCollection<S> min()
          Returns a PCollection made up of only the minimum element of this instance.
<K,V> PTable<K,V>
parallelDo(DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<T> PCollection<T>
parallelDo(DoFn<S,T> doFn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
<K,V> PTable<K,V>
parallelDo(String name, DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<T> PCollection<T>
parallelDo(String name, DoFn<S,T> doFn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
 PCollection<S> sample(double acceptanceProbability)
          Randomly sample items from this PCollection instance with the given probability of an item being accepted.
 PCollection<S> sample(double acceptanceProbability, long seed)
          Randomly sample items from this PCollection instance with the given probability of an item being accepted and using the given seed.
 PCollection<S> sort(boolean ascending)
          Returns a PCollection instance that contains all of the elements of this instance in sorted order.
 String toString()
           
 PCollection<S> union(PCollection<S>... collections)
          Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.
 PCollection<S> write(Target target)
          Write the contents of this PCollection to the given Target, using the storage format specified by the target.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

MemCollection

public MemCollection(Iterable<S> collect)

MemCollection

public MemCollection(Iterable<S> collect,
                     PType<S> ptype)

MemCollection

public MemCollection(Iterable<S> collect,
                     PType<S> ptype,
                     String name)
Method Detail

getPipeline

public Pipeline getPipeline()
Description copied from interface: PCollection
Returns the Pipeline associated with this PCollection.

Specified by:
getPipeline in interface PCollection<S>

union

public PCollection<S> union(PCollection<S>... collections)
Description copied from interface: PCollection
Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.

Specified by:
union in interface PCollection<S>

parallelDo

public <T> PCollection<T> parallelDo(DoFn<S,T> doFn,
                                     PType<T> type)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
doFn - The DoFn to apply
type - The PType of the resulting PCollection
Returns:
a new PCollection

parallelDo

public <T> PCollection<T> parallelDo(String name,
                                     DoFn<S,T> doFn,
                                     PType<T> type)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step, useful for debugging
doFn - The DoFn to apply
type - The PType of the resulting PCollection
Returns:
a new PCollection

parallelDo

public <K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> doFn,
                                    PTableType<K,V> type)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
doFn - The DoFn to apply
type - The PTableType of the resulting PTable
Returns:
a new PTable

parallelDo

public <K,V> PTable<K,V> parallelDo(String name,
                                    DoFn<S,Pair<K,V>> doFn,
                                    PTableType<K,V> type)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step
doFn - The DoFn to apply
type - The PTableType of the resulting PTable
Returns:
a new PTable

write

public PCollection<S> write(Target target)
Description copied from interface: PCollection
Write the contents of this PCollection to the given Target, using the storage format specified by the target.

Specified by:
write in interface PCollection<S>
Parameters:
target - The target to write to

materialize

public Iterable<S> materialize()
Description copied from interface: PCollection
Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.

Specified by:
materialize in interface PCollection<S>

getCollection

public Collection<S> getCollection()

getPType

public PType<S> getPType()
Description copied from interface: PCollection
Returns the PType of this PCollection.

Specified by:
getPType in interface PCollection<S>

getTypeFamily

public PTypeFamily getTypeFamily()
Description copied from interface: PCollection
Returns the PTypeFamily of this PCollection.

Specified by:
getTypeFamily in interface PCollection<S>

getSize

public long getSize()
Description copied from interface: PCollection
Returns the size of the data represented by this PCollection in bytes.

Specified by:
getSize in interface PCollection<S>

getName

public String getName()
Description copied from interface: PCollection
Returns a shorthand name for this PCollection.

Specified by:
getName in interface PCollection<S>

toString

public String toString()
Overrides:
toString in class Object

count

public PTable<S,Long> count()
Description copied from interface: PCollection
Returns a PTable instance that contains the counts of each unique element of this PCollection.

Specified by:
count in interface PCollection<S>

sample

public PCollection<S> sample(double acceptanceProbability)
Description copied from interface: PCollection
Randomly sample items from this PCollection instance with the given probability of an item being accepted.

Specified by:
sample in interface PCollection<S>

sample

public PCollection<S> sample(double acceptanceProbability,
                             long seed)
Description copied from interface: PCollection
Randomly sample items from this PCollection instance with the given probability of an item being accepted and using the given seed.

Specified by:
sample in interface PCollection<S>

max

public PCollection<S> max()
Description copied from interface: PCollection
Returns a PCollection made up of only the maximum element of this instance.

Specified by:
max in interface PCollection<S>

min

public PCollection<S> min()
Description copied from interface: PCollection
Returns a PCollection made up of only the minimum element of this instance.

Specified by:
min in interface PCollection<S>

sort

public PCollection<S> sort(boolean ascending)
Description copied from interface: PCollection
Returns a PCollection instance that contains all of the elements of this instance in sorted order.

Specified by:
sort in interface PCollection<S>

filter

public PCollection<S> filter(FilterFn<S> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<S>

filter

public PCollection<S> filter(String name,
                             FilterFn<S> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<S>
Parameters:
name - An identifier for this processing step
filterFn - The FilterFn to apply

by

public <K> PTable<K,S> by(MapFn<S,K> mapFn,
                          PType<K> keyType)
Description copied from interface: PCollection
Apply the given map function to each element of this instance in order to create a PTable.

Specified by:
by in interface PCollection<S>

by

public <K> PTable<K,S> by(String name,
                          MapFn<S,K> mapFn,
                          PType<K> keyType)
Description copied from interface: PCollection
Apply the given map function to each element of this instance in order to create a PTable.

Specified by:
by in interface PCollection<S>
Parameters:
name - An identifier for this processing step
mapFn - The MapFn to apply


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.