This project has retired. For details please refer to its Attic page.
Uses of Interface org.apache.crunch.PCollection (Apache Crunch 0.4.0-incubating API)

Uses of Interface
org.apache.crunch.PCollection

Packages that use PCollection
org.apache.crunch Client-facing API and core abstractions. 
org.apache.crunch.contrib.bloomfilter Support for creating Bloom Filters. 
org.apache.crunch.examples Example applications demonstrating various aspects of Crunch. 
org.apache.crunch.impl.mem In-memory Pipeline implementation for rapid prototyping and testing. 
org.apache.crunch.impl.mr A Pipeline implementation that runs on Hadoop MapReduce. 
org.apache.crunch.lib Joining, sorting, aggregating, and other commonly used functionality. 
org.apache.crunch.util An assorted set of utilities. 
 

Uses of PCollection in org.apache.crunch
 

Subinterfaces of PCollection in org.apache.crunch
 interface PGroupedTable<K,V>
          The Crunch representation of a grouped PTable.
 interface PTable<K,V>
          A sub-interface of PCollection that represents an immutable, distributed multi-map of keys and values.
 

Methods in org.apache.crunch that return PCollection
 PCollection<S> PCollection.filter(FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PCollection<S> PCollection.filter(String name, FilterFn<S> filterFn)
          Apply the given filter function to this instance and return the resulting PCollection.
 PCollection<K> PTable.keys()
          Returns a PCollection made up of the keys in this PTable.
<T> PCollection<T>
PCollection.parallelDo(DoFn<S,T> doFn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
<T> PCollection<T>
PCollection.parallelDo(String name, DoFn<S,T> doFn, PType<T> type)
          Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the output of this processing.
<T> PCollection<T>
Pipeline.read(Source<T> source)
          Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.
 PCollection<String> Pipeline.readTextFile(String pathName)
          A convenience method for reading a text file.
 PCollection<S> PCollection.sample(double acceptanceProbability)
          Randomly sample items from this PCollection instance with the given probability of an item being accepted.
 PCollection<S> PCollection.sample(double acceptanceProbability, long seed)
          Randomly sample items from this PCollection instance with the given probability of an item being accepted and using the given seed.
 PCollection<S> PCollection.sort(boolean ascending)
          Returns a PCollection instance that contains all of the elements of this instance in sorted order.
 PCollection<S> PCollection.union(PCollection<S>... collections)
          Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.
 PCollection<V> PTable.values()
          Returns a PCollection made up of the values in this PTable.
 PCollection<S> PCollection.write(Target target)
          Write the contents of this PCollection to the given Target, using the storage format specified by the target.
 

Methods in org.apache.crunch with parameters of type PCollection
<T> Iterable<T>
Pipeline.materialize(PCollection<T> pcollection)
          Create the given PCollection and read the data it contains into the returned Collection instance for client use.
 PCollection<S> PCollection.union(PCollection<S>... collections)
          Returns a PCollection instance that acts as the union of this PCollection and the input PCollections.
 void Pipeline.write(PCollection<?> collection, Target target)
          Write the given collection to the given target on the next pipeline run.
<T> void
Pipeline.writeTextFile(PCollection<T> collection, String pathName)
          A convenience method for writing a text file.
 

Uses of PCollection in org.apache.crunch.contrib.bloomfilter
 

Methods in org.apache.crunch.contrib.bloomfilter with parameters of type PCollection
static
<T> PObject<org.apache.hadoop.util.bloom.BloomFilter>
BloomFilterFactory.createFilter(PCollection<T> collection, BloomFilterFn<T> filterFn)
           
 

Uses of PCollection in org.apache.crunch.examples
 

Methods in org.apache.crunch.examples that return PCollection
 PCollection<org.apache.hadoop.hbase.client.Put> WordAggregationHBase.createPut(PTable<String,String> extractedText)
          Create puts in order to insert them in hbase.
 

Uses of PCollection in org.apache.crunch.impl.mem
 

Methods in org.apache.crunch.impl.mem that return PCollection
static
<T> PCollection<T>
MemPipeline.collectionOf(Iterable<T> collect)
           
static
<T> PCollection<T>
MemPipeline.collectionOf(T... ts)
           
<T> PCollection<T>
MemPipeline.read(Source<T> source)
           
 PCollection<String> MemPipeline.readTextFile(String pathName)
           
static
<T> PCollection<T>
MemPipeline.typedCollectionOf(PType<T> ptype, Iterable<T> collect)
           
static
<T> PCollection<T>
MemPipeline.typedCollectionOf(PType<T> ptype, T... ts)
           
 

Methods in org.apache.crunch.impl.mem with parameters of type PCollection
<T> Iterable<T>
MemPipeline.materialize(PCollection<T> pcollection)
           
 void MemPipeline.write(PCollection<?> collection, Target target)
           
<T> void
MemPipeline.writeTextFile(PCollection<T> collection, String pathName)
           
 

Uses of PCollection in org.apache.crunch.impl.mr
 

Methods in org.apache.crunch.impl.mr that return PCollection
<S> PCollection<S>
MRPipeline.read(Source<S> source)
           
 PCollection<String> MRPipeline.readTextFile(String pathName)
           
 

Methods in org.apache.crunch.impl.mr with parameters of type PCollection
<T> ReadableSourceTarget<T>
MRPipeline.getMaterializeSourceTarget(PCollection<T> pcollection)
          Retrieve a ReadableSourceTarget that provides access to the contents of a PCollection.
<T> Iterable<T>
MRPipeline.materialize(PCollection<T> pcollection)
           
 void MRPipeline.write(PCollection<?> pcollection, Target target)
           
<T> void
MRPipeline.writeTextFile(PCollection<T> pcollection, String pathName)
           
 

Uses of PCollection in org.apache.crunch.lib
 

Methods in org.apache.crunch.lib that return PCollection
static
<T> PCollection<Tuple3<T,T,T>>
Set.comm(PCollection<T> coll1, PCollection<T> coll2)
          Find the elements that are common to two sets, like the Unix comm utility.
static
<U,V> PCollection<Pair<U,V>>
Cartesian.cross(PCollection<U> left, PCollection<V> right)
          Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
static
<U,V> PCollection<Pair<U,V>>
Cartesian.cross(PCollection<U> left, PCollection<V> right, int parallelism)
          Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
static
<T> PCollection<T>
Set.difference(PCollection<T> coll1, PCollection<T> coll2)
          Compute the set difference between two sets of elements.
static
<T> PCollection<T>
Set.intersection(PCollection<T> coll1, PCollection<T> coll2)
          Compute the intersection of two sets of elements.
static
<K,V> PCollection<K>
PTables.keys(PTable<K,V> ptable)
           
static
<S> PCollection<S>
Sample.sample(PCollection<S> input, double probability)
           
static
<S> PCollection<S>
Sample.sample(PCollection<S> input, long seed, double probability)
           
static
<T> PCollection<T>
Sort.sort(PCollection<T> collection)
          Sorts the PCollection using the natural ordering of its elements.
static
<T> PCollection<T>
Sort.sort(PCollection<T> collection, Sort.Order order)
          Sorts the PCollection using the natural ordering of its elements in the order specified.
static
<K,V1,V2,T>
PCollection<T>
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input, DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn, PType<T> ptype)
          Perform a secondary sort on the given PTable instance and then apply a DoFn to the resulting sorted data to yield an output PCollection<T>.
static
<U,V> PCollection<Pair<U,V>>
Sort.sortPairs(PCollection<Pair<U,V>> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of Pairs using the specified column ordering.
static
<V1,V2,V3,V4>
PCollection<Tuple4<V1,V2,V3,V4>>
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of Tuple4s using the specified column ordering.
static
<V1,V2,V3> PCollection<Tuple3<V1,V2,V3>>
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of Tuple3s using the specified column ordering.
static PCollection<TupleN> Sort.sortTuples(PCollection<TupleN> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of TupleNs using the specified column ordering.
static
<K,V> PCollection<V>
PTables.values(PTable<K,V> ptable)
           
 

Methods in org.apache.crunch.lib with parameters of type PCollection
static
<T> PCollection<Tuple3<T,T,T>>
Set.comm(PCollection<T> coll1, PCollection<T> coll2)
          Find the elements that are common to two sets, like the Unix comm utility.
static
<T> PCollection<Tuple3<T,T,T>>
Set.comm(PCollection<T> coll1, PCollection<T> coll2)
          Find the elements that are common to two sets, like the Unix comm utility.
static
<S> PTable<S,Long>
Aggregate.count(PCollection<S> collect)
          Returns a PTable that contains the unique elements of this collection mapped to a count of their occurrences.
static
<U,V> PCollection<Pair<U,V>>
Cartesian.cross(PCollection<U> left, PCollection<V> right)
          Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
static
<U,V> PCollection<Pair<U,V>>
Cartesian.cross(PCollection<U> left, PCollection<V> right)
          Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
static
<U,V> PCollection<Pair<U,V>>
Cartesian.cross(PCollection<U> left, PCollection<V> right, int parallelism)
          Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
static
<U,V> PCollection<Pair<U,V>>
Cartesian.cross(PCollection<U> left, PCollection<V> right, int parallelism)
          Performs a full cross join on the specified PCollections (using the same strategy as Pig's CROSS operator).
static
<T> PCollection<T>
Set.difference(PCollection<T> coll1, PCollection<T> coll2)
          Compute the set difference between two sets of elements.
static
<T> PCollection<T>
Set.difference(PCollection<T> coll1, PCollection<T> coll2)
          Compute the set difference between two sets of elements.
static
<T> PCollection<T>
Set.intersection(PCollection<T> coll1, PCollection<T> coll2)
          Compute the intersection of two sets of elements.
static
<T> PCollection<T>
Set.intersection(PCollection<T> coll1, PCollection<T> coll2)
          Compute the intersection of two sets of elements.
static
<S> PObject<Long>
Aggregate.length(PCollection<S> collect)
          Returns the number of elements in the provided PCollection.
static
<S> PObject<S>
Aggregate.max(PCollection<S> collect)
          Returns the largest numerical element from the input collection.
static
<S> PObject<S>
Aggregate.min(PCollection<S> collect)
          Returns the smallest numerical element from the input collection.
static
<S> PCollection<S>
Sample.sample(PCollection<S> input, double probability)
           
static
<S> PCollection<S>
Sample.sample(PCollection<S> input, long seed, double probability)
           
static
<T> PCollection<T>
Sort.sort(PCollection<T> collection)
          Sorts the PCollection using the natural ordering of its elements.
static
<T> PCollection<T>
Sort.sort(PCollection<T> collection, Sort.Order order)
          Sorts the PCollection using the natural ordering of its elements in the order specified.
static
<U,V> PCollection<Pair<U,V>>
Sort.sortPairs(PCollection<Pair<U,V>> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of Pairs using the specified column ordering.
static
<V1,V2,V3,V4>
PCollection<Tuple4<V1,V2,V3,V4>>
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of Tuple4s using the specified column ordering.
static
<V1,V2,V3> PCollection<Tuple3<V1,V2,V3>>
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of Tuple3s using the specified column ordering.
static PCollection<TupleN> Sort.sortTuples(PCollection<TupleN> collection, Sort.ColumnOrder... columnOrders)
          Sorts the PCollection of TupleNs using the specified column ordering.
 

Uses of PCollection in org.apache.crunch.util
 

Methods in org.apache.crunch.util that return PCollection
<T> PCollection<T>
CrunchTool.read(Source<T> source)
           
 PCollection<String> CrunchTool.readTextFile(String pathName)
           
 

Methods in org.apache.crunch.util with parameters of type PCollection
 void CrunchTool.write(PCollection<?> pcollection, Target target)
           
 void CrunchTool.writeTextFile(PCollection<?> pcollection, String pathName)
           
 



Copyright © 2012 The Apache Software Foundation. All Rights Reserved.