|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.crunch.impl.mr.collect.PCollectionImpl<S>
public abstract class PCollectionImpl<S>
| Nested Class Summary | |
|---|---|
static interface |
PCollectionImpl.Visitor
|
| Constructor Summary | |
|---|---|
PCollectionImpl(String name)
|
|
| Method Summary | ||
|---|---|---|
void |
accept(PCollectionImpl.Visitor visitor)
|
|
|
by(MapFn<S,K> mapFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to create a PTable. |
|
|
by(String name,
MapFn<S,K> mapFn,
PType<K> keyType)
Apply the given map function to each element of this instance in order to create a PTable. |
|
PTable<S,Long> |
count()
Returns a PTable instance that contains the counts of each unique
element of this PCollection. |
|
abstract DoNode |
createDoNode()
|
|
PCollection<S> |
filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection. |
|
PCollection<S> |
filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting PCollection. |
|
int |
getDepth()
|
|
SourceTarget<S> |
getMaterializedAt()
|
|
String |
getName()
Returns a shorthand name for this PCollection. |
|
PCollectionImpl<?> |
getOnlyParent()
|
|
abstract List<PCollectionImpl<?>> |
getParents()
|
|
Pipeline |
getPipeline()
Returns the Pipeline associated with this PCollection. |
|
long |
getSize()
Returns the size of the data represented by this PCollection in
bytes. |
|
PTypeFamily |
getTypeFamily()
Returns the PTypeFamily of this PCollection. |
|
Iterable<S> |
materialize()
Returns a reference to the data set represented by this PCollection that may be used by the client to read the data locally. |
|
void |
materializeAt(SourceTarget<S> sourceTarget)
|
|
PCollection<S> |
max()
Returns a PCollection made up of only the maximum element of this
instance. |
|
PCollection<S> |
min()
Returns a PCollection made up of only the minimum element of this
instance. |
|
|
parallelDo(DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type)
Similar to the other parallelDo instance, but returns a
PTable instance instead of a PCollection. |
|
|
parallelDo(DoFn<S,T> fn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
|
parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type)
Similar to the other parallelDo instance, but returns a
PTable instance instead of a PCollection. |
|
|
parallelDo(String name,
DoFn<S,T> fn,
PType<T> type)
Applies the given doFn to the elements of this PCollection and
returns a new PCollection that is the output of this processing. |
|
PCollection<S> |
sample(double acceptanceProbability)
Randomly sample items from this PCollection instance with the given probability of an item being accepted. |
|
PCollection<S> |
sample(double acceptanceProbability,
long seed)
Randomly sample items from this PCollection instance with the given probability of an item being accepted and using the given seed. |
|
PCollection<S> |
sort(boolean ascending)
Returns a PCollection instance that contains all of the elements of
this instance in sorted order. |
|
String |
toString()
|
|
PCollection<S> |
union(PCollection<S>... collections)
Returns a PCollection instance that acts as the union of this
PCollection and the input PCollections. |
|
PCollection<S> |
write(Target target)
Write the contents of this PCollection to the given Target,
using the storage format specified by the target. |
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface org.apache.crunch.PCollection |
|---|
getPType |
| Constructor Detail |
|---|
public PCollectionImpl(String name)
| Method Detail |
|---|
public String getName()
PCollection
getName in interface PCollection<S>public String toString()
toString in class Objectpublic PCollection<S> union(PCollection<S>... collections)
PCollectionPCollection instance that acts as the union of this
PCollection and the input PCollections.
union in interface PCollection<S>
public <T> PCollection<T> parallelDo(DoFn<S,T> fn,
PType<T> type)
PCollectionPCollection and
returns a new PCollection that is the output of this processing.
parallelDo in interface PCollection<S>fn - The DoFn to applytype - The PType of the resulting PCollection
PCollection
public <T> PCollection<T> parallelDo(String name,
DoFn<S,T> fn,
PType<T> type)
PCollectionPCollection and
returns a new PCollection that is the output of this processing.
parallelDo in interface PCollection<S>name - An identifier for this processing step, useful for debuggingfn - The DoFn to applytype - The PType of the resulting PCollection
PCollection
public <K,V> PTable<K,V> parallelDo(DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type)
PCollectionparallelDo instance, but returns a
PTable instance instead of a PCollection.
parallelDo in interface PCollection<S>fn - The DoFn to applytype - The PTableType of the resulting PTable
PTable
public <K,V> PTable<K,V> parallelDo(String name,
DoFn<S,Pair<K,V>> fn,
PTableType<K,V> type)
PCollectionparallelDo instance, but returns a
PTable instance instead of a PCollection.
parallelDo in interface PCollection<S>name - An identifier for this processing stepfn - The DoFn to applytype - The PTableType of the resulting PTable
PTablepublic PCollection<S> write(Target target)
PCollectionPCollection to the given Target,
using the storage format specified by the target.
write in interface PCollection<S>target - The target to write topublic Iterable<S> materialize()
PCollection
materialize in interface PCollection<S>public SourceTarget<S> getMaterializedAt()
public void materializeAt(SourceTarget<S> sourceTarget)
public PCollection<S> filter(FilterFn<S> filterFn)
PCollectionPCollection.
filter in interface PCollection<S>
public PCollection<S> filter(String name,
FilterFn<S> filterFn)
PCollectionPCollection.
filter in interface PCollection<S>name - An identifier for this processing stepfilterFn - The FilterFn to apply
public <K> PTable<K,S> by(MapFn<S,K> mapFn,
PType<K> keyType)
PCollectionPTable.
by in interface PCollection<S>
public <K> PTable<K,S> by(String name,
MapFn<S,K> mapFn,
PType<K> keyType)
PCollectionPTable.
by in interface PCollection<S>name - An identifier for this processing stepmapFn - The MapFn to applypublic PCollection<S> sort(boolean ascending)
PCollectionPCollection instance that contains all of the elements of
this instance in sorted order.
sort in interface PCollection<S>public PTable<S,Long> count()
PCollectionPTable instance that contains the counts of each unique
element of this PCollection.
count in interface PCollection<S>public PCollection<S> max()
PCollectionPCollection made up of only the maximum element of this
instance.
max in interface PCollection<S>public PCollection<S> min()
PCollectionPCollection made up of only the minimum element of this
instance.
min in interface PCollection<S>public PCollection<S> sample(double acceptanceProbability)
PCollection
sample in interface PCollection<S>
public PCollection<S> sample(double acceptanceProbability,
long seed)
PCollection
sample in interface PCollection<S>public PTypeFamily getTypeFamily()
PCollectionPTypeFamily of this PCollection.
getTypeFamily in interface PCollection<S>public abstract DoNode createDoNode()
public abstract List<PCollectionImpl<?>> getParents()
public PCollectionImpl<?> getOnlyParent()
public Pipeline getPipeline()
PCollectionPipeline associated with this PCollection.
getPipeline in interface PCollection<S>public int getDepth()
public void accept(PCollectionImpl.Visitor visitor)
public long getSize()
PCollectionPCollection in
bytes.
getSize in interface PCollection<S>
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||