This project has retired. For details please refer to its Attic page.
MemCollection (Apache Crunch 0.9.0 API)

org.apache.crunch.impl.mem.collect
Class MemCollection<S>

java.lang.Object
  extended by org.apache.crunch.impl.mem.collect.MemCollection<S>
All Implemented Interfaces:
PCollection<S>
Direct Known Subclasses:
MemTable

public class MemCollection<S>
extends Object
implements PCollection<S>


Constructor Summary
MemCollection(Iterable<S> collect)
           
MemCollection(Iterable<S> collect, PType<S> ptype)
           
MemCollection(Iterable<S> collect, PType<S> ptype, String name)
           
 
Method Summary
 PObject<Collection<S>> asCollection()
          
 ReadableData<S> asReadable(boolean materialize)
           
<K> PTable<K,S>
by(MapFn<S,K> mapFn, PType<K> keyType)
          Apply the given map function to each element of this instance in order to create a PTable.
<K> PTable<K,S>
by(String name, MapFn<S,K> mapFn, PType<K> keyType)
          Apply the given map function to each element of this instance in order to create a PTable.
 PCollection<S> cache()
          Marks this data as cached using the default CachingOptions.
 PCollection<S> cache(CachingOptions options)
          Marks this data as cached using the given CachingOptions.
 PTable<S,Long> count()
          Returns a PTable instance that contains the counts of each unique element of this PCollection.
 PCollection<S> filter(FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PCollection<S> filter(String name, FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 Collection<S> getCollection()
           
 String getName()
          Returns a shorthand name for this PCollection.
 Pipeline getPipeline()
          Returns the Pipeline associated with this PCollection.
 PType<S> getPType()
          Returns the PType of this PCollection.
 long getSize()
          Returns the size of the data represented by this PCollection in bytes.
 PTypeFamily getTypeFamily()
          Returns the PTypeFamily of this PCollection.
 PObject<Long> length()
          Returns the number of elements represented by this PCollection.
 Iterable<S> materialize()
          Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.
 PObject<S> max()
          Returns a PObject of the maximum element of this instance.
 PObject<S> min()
          Returns a PObject of the minimum element of this instance.
<K,V> PTable<K,V>
parallelDo(DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<T> PCollection<T>
parallelDo(DoFn<S,T> doFn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
<K,V> PTable<K,V>
parallelDo(String name, DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<K,V> PTable<K,V>
parallelDo(String name, DoFn<S,Pair<K,V>> doFn, PTableType<K,V> type, ParallelDoOptions options)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<T> PCollection<T>
parallelDo(String name, DoFn<S,T> doFn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
<T> PCollection<T>
parallelDo(String name, DoFn<S,T> doFn, PType<T> type, ParallelDoOptions options)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
 String toString()
           
 PCollection<S> union(PCollection<S>... collections)
          Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.
 PCollection<S> union(PCollection<S> other)
          Returns a PCollection instance that acts as the union of this PCollection and the given PCollection.
 PCollection<S> write(Target target)
          Write the contents of this PCollection to the given Target, using the storage format specified by the target.
 PCollection<S> write(Target target, Target.WriteMode writeMode)
          Write the contents of this PCollection to the given Target, using the given Target.WriteMode to handle existing targets.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

MemCollection

public MemCollection(Iterable<S> collect)

MemCollection

public MemCollection(Iterable<S> collect,
                     PType<S> ptype)

MemCollection

public MemCollection(Iterable<S> collect,
                     PType<S> ptype,
                     String name)
Method Detail

getPipeline

public Pipeline getPipeline()
Description copied from interface: PCollection
Returns the Pipeline associated with this PCollection.

Specified by:
getPipeline in interface PCollection<S>

union

public PCollection<S> union(PCollection<S> other)
Description copied from interface: PCollection
Returns a PCollection instance that acts as the union of this PCollection and the given PCollection.

Specified by:
union in interface PCollection<S>

union

public PCollection<S> union(PCollection<S>... collections)
Description copied from interface: PCollection
Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.

Specified by:
union in interface PCollection<S>

parallelDo

public <T> PCollection<T> parallelDo(DoFn<S,T> doFn,
                                     PType<T> type)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
doFn - The DoFn to apply
type - The PType of the resulting PCollection
Returns:
a new PCollection

parallelDo

public <T> PCollection<T> parallelDo(String name,
                                     DoFn<S,T> doFn,
                                     PType<T> type)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step, useful for debugging
doFn - The DoFn to apply
type - The PType of the resulting PCollection
Returns:
a new PCollection

parallelDo

public <T> PCollection<T> parallelDo(String name,
                                     DoFn<S,T> doFn,
                                     PType<T> type,
                                     ParallelDoOptions options)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step, useful for debugging
doFn - The DoFn to apply
type - The PType of the resulting PCollection
options - Optional information that is needed for certain pipeline operations
Returns:
a new PCollection

parallelDo

public <K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> doFn,
                                    PTableType<K,V> type)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
doFn - The DoFn to apply
type - The PTableType of the resulting PTable
Returns:
a new PTable

parallelDo

public <K,V> PTable<K,V> parallelDo(String name,
                                    DoFn<S,Pair<K,V>> doFn,
                                    PTableType<K,V> type)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step
doFn - The DoFn to apply
type - The PTableType of the resulting PTable
Returns:
a new PTable

parallelDo

public <K,V> PTable<K,V> parallelDo(String name,
                                    DoFn<S,Pair<K,V>> doFn,
                                    PTableType<K,V> type,
                                    ParallelDoOptions options)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step
doFn - The DoFn to apply
type - The PTableType of the resulting PTable
options - Optional information that is needed for certain pipeline operations
Returns:
a new PTable

write

public PCollection<S> write(Target target)
Description copied from interface: PCollection
Write the contents of this PCollection to the given Target, using the storage format specified by the target.

Specified by:
write in interface PCollection<S>
Parameters:
target - The target to write to

write

public PCollection<S> write(Target target,
                            Target.WriteMode writeMode)
Description copied from interface: PCollection
Write the contents of this PCollection to the given Target, using the given Target.WriteMode to handle existing targets.

Specified by:
write in interface PCollection<S>
Parameters:
target - The target
writeMode - The rule for handling existing outputs at the target location

materialize

public Iterable<S> materialize()
Description copied from interface: PCollection
Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.

Specified by:
materialize in interface PCollection<S>

cache

public PCollection<S> cache()
Description copied from interface: PCollection
Marks this data as cached using the default CachingOptions. Cached PCollections will only be processed once, and then their contents will be saved so that downstream code can process them many times.

Specified by:
cache in interface PCollection<S>
Returns:
this PCollection instance

cache

public PCollection<S> cache(CachingOptions options)
Description copied from interface: PCollection
Marks this data as cached using the given CachingOptions. Cached PCollections will only be processed once and then their contents will be saved so that downstream code can process them many times.

Specified by:
cache in interface PCollection<S>
Parameters:
options - the options that control the cache settings for the data
Returns:
this PCollection instance

asCollection

public PObject<Collection<S>> asCollection()

Specified by:
asCollection in interface PCollection<S>
Returns:
A PObject encapsulating an in-memory Collection containing the values of this PCollection.

asReadable

public ReadableData<S> asReadable(boolean materialize)
Specified by:
asReadable in interface PCollection<S>
Parameters:
materialize - If true, materialize this data before returning a reference to it
Returns:
A reference to the data in this instance that can be read from a job running on a cluster.

getCollection

public Collection<S> getCollection()

getPType

public PType<S> getPType()
Description copied from interface: PCollection
Returns the PType of this PCollection.

Specified by:
getPType in interface PCollection<S>

getTypeFamily

public PTypeFamily getTypeFamily()
Description copied from interface: PCollection
Returns the PTypeFamily of this PCollection.

Specified by:
getTypeFamily in interface PCollection<S>

getSize

public long getSize()
Description copied from interface: PCollection
Returns the size of the data represented by this PCollection in bytes.

Specified by:
getSize in interface PCollection<S>

getName

public String getName()
Description copied from interface: PCollection
Returns a shorthand name for this PCollection.

Specified by:
getName in interface PCollection<S>

toString

public String toString()
Overrides:
toString in class Object

count

public PTable<S,Long> count()
Description copied from interface: PCollection
Returns a PTable instance that contains the counts of each unique element of this PCollection.

Specified by:
count in interface PCollection<S>

length

public PObject<Long> length()
Description copied from interface: PCollection
Returns the number of elements represented by this PCollection.

Specified by:
length in interface PCollection<S>
Returns:
An PObject containing the number of elements in this PCollection.

max

public PObject<S> max()
Description copied from interface: PCollection
Returns a PObject of the maximum element of this instance.

Specified by:
max in interface PCollection<S>

min

public PObject<S> min()
Description copied from interface: PCollection
Returns a PObject of the minimum element of this instance.

Specified by:
min in interface PCollection<S>

filter

public PCollection<S> filter(FilterFn<S> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<S>

filter

public PCollection<S> filter(String name,
                             FilterFn<S> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<S>
Parameters:
name - An identifier for this processing step
filterFn - The FilterFn to apply

by

public <K> PTable<K,S> by(MapFn<S,K> mapFn,
                          PType<K> keyType)
Description copied from interface: PCollection
Apply the given map function to each element of this instance in order to create a PTable.

Specified by:
by in interface PCollection<S>

by

public <K> PTable<K,S> by(String name,
                          MapFn<S,K> mapFn,
                          PType<K> keyType)
Description copied from interface: PCollection
Apply the given map function to each element of this instance in order to create a PTable.

Specified by:
by in interface PCollection<S>
Parameters:
name - An identifier for this processing step
mapFn - The MapFn to apply


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.