This project has retired. For details please refer to its Attic page.
PCollectionImpl (Apache Crunch 0.3.0-incubating API)

org.apache.crunch.impl.mr.collect
Class PCollectionImpl<S>

java.lang.Object
  extended by org.apache.crunch.impl.mr.collect.PCollectionImpl<S>
All Implemented Interfaces:
PCollection<S>
Direct Known Subclasses:
DoCollectionImpl, InputCollection, PGroupedTableImpl, PTableBase, UnionCollection

public abstract class PCollectionImpl<S>
extends Object
implements PCollection<S>


Nested Class Summary
static interface PCollectionImpl.Visitor
           
 
Constructor Summary
PCollectionImpl(String name)
           
 
Method Summary
 void accept(PCollectionImpl.Visitor visitor)
           
<K> PTable<K,S>
by(MapFn<S,K> mapFn, PType<K> keyType)
          Apply the given map function to each element of this instance in order to create a PTable.
<K> PTable<K,S>
by(String name, MapFn<S,K> mapFn, PType<K> keyType)
          Apply the given map function to each element of this instance in order to create a PTable.
 PTable<S,Long> count()
          Returns a PTable instance that contains the counts of each unique element of this PCollection.
abstract  DoNode createDoNode()
           
 PCollection<S> filter(FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PCollection<S> filter(String name, FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 int getDepth()
           
 SourceTarget<S> getMaterializedAt()
           
 String getName()
          Returns a shorthand name for this PCollection.
 PCollectionImpl<?> getOnlyParent()
           
abstract  List<PCollectionImpl<?>> getParents()
           
 Pipeline getPipeline()
          Returns the Pipeline associated with this PCollection.
 long getSize()
          Returns the size of the data represented by this PCollection in bytes.
 PTypeFamily getTypeFamily()
          Returns the PTypeFamily of this PCollection.
 Iterable<S> materialize()
          Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.
 void materializeAt(SourceTarget<S> sourceTarget)
           
 PCollection<S> max()
          Returns a PCollection made up of only the maximum element of this instance.
 PCollection<S> min()
          Returns a PCollection made up of only the minimum element of this instance.
<K,V> PTable<K,V>
parallelDo(DoFn<S,Pair<K,V>> fn, PTableType<K,V> type)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<T> PCollection<T>
parallelDo(DoFn<S,T> fn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
<K,V> PTable<K,V>
parallelDo(String name, DoFn<S,Pair<K,V>> fn, PTableType<K,V> type)
          Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.
<T> PCollection<T>
parallelDo(String name, DoFn<S,T> fn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
 PCollection<S> sample(double acceptanceProbability)
          Randomly sample items from this PCollection instance with the given probability of an item being accepted.
 PCollection<S> sample(double acceptanceProbability, long seed)
          Randomly sample items from this PCollection instance with the given probability of an item being accepted and using the given seed.
 PCollection<S> sort(boolean ascending)
          Returns a PCollection instance that contains all of the elements of this instance in sorted order.
 String toString()
           
 PCollection<S> union(PCollection<S>... collections)
          Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.
 PCollection<S> write(Target target)
          Write the contents of this PCollection to the given Target, using the storage format specified by the target.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.crunch.PCollection
getPType
 

Constructor Detail

PCollectionImpl

public PCollectionImpl(String name)
Method Detail

getName

public String getName()
Description copied from interface: PCollection
Returns a shorthand name for this PCollection.

Specified by:
getName in interface PCollection<S>

toString

public String toString()
Overrides:
toString in class Object

union

public PCollection<S> union(PCollection<S>... collections)
Description copied from interface: PCollection
Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.

Specified by:
union in interface PCollection<S>

parallelDo

public <T> PCollection<T> parallelDo(DoFn<S,T> fn,
                                     PType<T> type)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
fn - The DoFn to apply
type - The PType of the resulting PCollection
Returns:
a new PCollection

parallelDo

public <T> PCollection<T> parallelDo(String name,
                                     DoFn<S,T> fn,
                                     PType<T> type)
Description copied from interface: PCollection
Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step, useful for debugging
fn - The DoFn to apply
type - The PType of the resulting PCollection
Returns:
a new PCollection

parallelDo

public <K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> fn,
                                    PTableType<K,V> type)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
fn - The DoFn to apply
type - The PTableType of the resulting PTable
Returns:
a new PTable

parallelDo

public <K,V> PTable<K,V> parallelDo(String name,
                                    DoFn<S,Pair<K,V>> fn,
                                    PTableType<K,V> type)
Description copied from interface: PCollection
Similar to the other parallelDo instance, but returns a PTable instance instead of a PCollection.

Specified by:
parallelDo in interface PCollection<S>
Parameters:
name - An identifier for this processing step
fn - The DoFn to apply
type - The PTableType of the resulting PTable
Returns:
a new PTable

write

public PCollection<S> write(Target target)
Description copied from interface: PCollection
Write the contents of this PCollection to the given Target, using the storage format specified by the target.

Specified by:
write in interface PCollection<S>
Parameters:
target - The target to write to

materialize

public Iterable<S> materialize()
Description copied from interface: PCollection
Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally.

Specified by:
materialize in interface PCollection<S>

getMaterializedAt

public SourceTarget<S> getMaterializedAt()

materializeAt

public void materializeAt(SourceTarget<S> sourceTarget)

filter

public PCollection<S> filter(FilterFn<S> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<S>

filter

public PCollection<S> filter(String name,
                             FilterFn<S> filterFn)
Description copied from interface: PCollection
Apply the given filter function to this instance and return the resulting PCollection.

Specified by:
filter in interface PCollection<S>
Parameters:
name - An identifier for this processing step
filterFn - The FilterFn to apply

by

public <K> PTable<K,S> by(MapFn<S,K> mapFn,
                          PType<K> keyType)
Description copied from interface: PCollection
Apply the given map function to each element of this instance in order to create a PTable.

Specified by:
by in interface PCollection<S>

by

public <K> PTable<K,S> by(String name,
                          MapFn<S,K> mapFn,
                          PType<K> keyType)
Description copied from interface: PCollection
Apply the given map function to each element of this instance in order to create a PTable.

Specified by:
by in interface PCollection<S>
Parameters:
name - An identifier for this processing step
mapFn - The MapFn to apply

sort

public PCollection<S> sort(boolean ascending)
Description copied from interface: PCollection
Returns a PCollection instance that contains all of the elements of this instance in sorted order.

Specified by:
sort in interface PCollection<S>

count

public PTable<S,Long> count()
Description copied from interface: PCollection
Returns a PTable instance that contains the counts of each unique element of this PCollection.

Specified by:
count in interface PCollection<S>

max

public PCollection<S> max()
Description copied from interface: PCollection
Returns a PCollection made up of only the maximum element of this instance.

Specified by:
max in interface PCollection<S>

min

public PCollection<S> min()
Description copied from interface: PCollection
Returns a PCollection made up of only the minimum element of this instance.

Specified by:
min in interface PCollection<S>

sample

public PCollection<S> sample(double acceptanceProbability)
Description copied from interface: PCollection
Randomly sample items from this PCollection instance with the given probability of an item being accepted.

Specified by:
sample in interface PCollection<S>

sample

public PCollection<S> sample(double acceptanceProbability,
                             long seed)
Description copied from interface: PCollection
Randomly sample items from this PCollection instance with the given probability of an item being accepted and using the given seed.

Specified by:
sample in interface PCollection<S>

getTypeFamily

public PTypeFamily getTypeFamily()
Description copied from interface: PCollection
Returns the PTypeFamily of this PCollection.

Specified by:
getTypeFamily in interface PCollection<S>

createDoNode

public abstract DoNode createDoNode()

getParents

public abstract List<PCollectionImpl<?>> getParents()

getOnlyParent

public PCollectionImpl<?> getOnlyParent()

getPipeline

public Pipeline getPipeline()
Description copied from interface: PCollection
Returns the Pipeline associated with this PCollection.

Specified by:
getPipeline in interface PCollection<S>

getDepth

public int getDepth()

accept

public void accept(PCollectionImpl.Visitor visitor)

getSize

public long getSize()
Description copied from interface: PCollection
Returns the size of the data represented by this PCollection in bytes.

Specified by:
getSize in interface PCollection<S>


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.