| Package | Description |
|---|---|
| org.apache.crunch |
Client-facing API and core abstractions.
|
| org.apache.crunch.contrib.bloomfilter |
Support for creating Bloom Filters.
|
| org.apache.crunch.contrib.text | |
| org.apache.crunch.examples |
Example applications demonstrating various aspects of Crunch.
|
| org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
| org.apache.crunch.impl.mr |
A Pipeline implementation that runs on Hadoop MapReduce.
|
| org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
| org.apache.crunch.util |
An assorted set of utilities.
|
| Modifier and Type | Interface and Description |
|---|---|
interface |
PGroupedTable<K,V>
The Crunch representation of a grouped
PTable. |
interface |
PTable<K,V>
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
| Modifier and Type | Method and Description |
|---|---|
PCollection<S> |
PCollection.filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection. |
PCollection<S> |
PCollection.filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection. |
PCollection<K> |
PTable.keys()
Returns a
PCollection made up of the keys in this PTable. |
<T> PCollection<T> |
PCollection.parallelDo(DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
Pipeline.read(Source<T> source)
Converts the given
Source into a PCollection that is
available to jobs run using this Pipeline instance. |
PCollection<String> |
Pipeline.readTextFile(String pathName)
A convenience method for reading a text file.
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a
PCollection instance that acts as the union of this
PCollection and the input PCollections. |
PCollection<V> |
PTable.values()
Returns a
PCollection made up of the values in this PTable. |
PCollection<S> |
PCollection.write(Target target)
Write the contents of this
PCollection to the given Target,
using the storage format specified by the target. |
PCollection<S> |
PCollection.write(Target target,
Target.WriteMode writeMode)
Write the contents of this
PCollection to the given Target,
using the given Target.WriteMode to handle existing
targets. |
| Modifier and Type | Method and Description |
|---|---|
<T> Iterable<T> |
Pipeline.materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a
PCollection instance that acts as the union of this
PCollection and the input PCollections. |
void |
Pipeline.write(PCollection<?> collection,
Target target)
Write the given collection to the given target on the next pipeline run.
|
void |
Pipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
Write the contents of the
PCollection to the given Target,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
<T> void |
Pipeline.writeTextFile(PCollection<T> collection,
String pathName)
A convenience method for writing a text file.
|
| Modifier and Type | Method and Description |
|---|---|
static <T> PObject<org.apache.hadoop.util.bloom.BloomFilter> |
BloomFilterFactory.createFilter(PCollection<T> collection,
BloomFilterFn<T> filterFn) |
| Modifier and Type | Method and Description |
|---|---|
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T>. |
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily. |
| Modifier and Type | Method and Description |
|---|---|
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T>. |
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily. |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>>. |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily. |
| Modifier and Type | Method and Description |
|---|---|
PCollection<org.apache.hadoop.hbase.client.Put> |
WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase.
|
| Modifier and Type | Method and Description |
|---|---|
static <T> PCollection<T> |
MemPipeline.collectionOf(Iterable<T> collect) |
static <T> PCollection<T> |
MemPipeline.collectionOf(T... ts) |
<T> PCollection<T> |
MemPipeline.read(Source<T> source) |
PCollection<String> |
MemPipeline.readTextFile(String pathName) |
static <T> PCollection<T> |
MemPipeline.typedCollectionOf(PType<T> ptype,
Iterable<T> collect) |
static <T> PCollection<T> |
MemPipeline.typedCollectionOf(PType<T> ptype,
T... ts) |
| Modifier and Type | Method and Description |
|---|---|
<T> Iterable<T> |
MemPipeline.materialize(PCollection<T> pcollection) |
void |
MemPipeline.write(PCollection<?> collection,
Target target) |
void |
MemPipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode) |
<T> void |
MemPipeline.writeTextFile(PCollection<T> collection,
String pathName) |
| Modifier and Type | Method and Description |
|---|---|
<S> PCollection<S> |
MRPipeline.read(Source<S> source) |
PCollection<String> |
MRPipeline.readTextFile(String pathName) |
| Modifier and Type | Method and Description |
|---|---|
<T> ReadableSource<T> |
MRPipeline.getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a
PCollection. |
<T> Iterable<T> |
MRPipeline.materialize(PCollection<T> pcollection) |
void |
MRPipeline.write(PCollection<?> pcollection,
Target target) |
void |
MRPipeline.write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode) |
<T> void |
MRPipeline.writeTextFile(PCollection<T> pcollection,
String pathName) |
| Modifier and Type | Method and Description |
|---|---|
static <T> PCollection<Tuple3<T,T,T>> |
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix
comm utility. |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollections (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollections (using the
same strategy as Pig's CROSS operator). |
static <T> PCollection<T> |
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements.
|
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input)
Construct a new
PCollection that contains the unique elements of a
given input PCollection. |
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input,
int flushEvery)
A
distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
static <T> PCollection<T> |
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements.
|
static <K,V> PCollection<K> |
PTables.keys(PTable<K,V> ptable)
Extract the keys from the given
PTable<K, V> as a PCollection<K>. |
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
double probability)
Output records from the given
PCollection with the given probability. |
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
long seed,
double probability)
Output records from the given
PCollection using a given seed. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection)
Sorts the
PCollection using the natural ordering of its elements. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the
PCollection using the natural ordering of its elements in
the order specified. |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T>. |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pairs using the specified column
ordering. |
static <V1,V2,V3,V4> |
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple4s using the specified column
ordering. |
static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>> |
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple3s using the specified column
ordering. |
static PCollection<TupleN> |
Sort.sortTuples(PCollection<TupleN> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of TupleNs using the specified column
ordering. |
static <K,V> PCollection<V> |
PTables.values(PTable<K,V> ptable)
Extract the values from the given
PTable<K, V> as a PCollection<V>. |
| Modifier and Type | Method and Description |
|---|---|
static <K,V> PTable<K,V> |
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given
PCollection<Pair<K, V>> to a PTable<K, V>. |
static <T> PCollection<Tuple3<T,T,T>> |
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix
comm utility. |
static <T> PCollection<Tuple3<T,T,T>> |
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix
comm utility. |
static <S> PTable<S,Long> |
Aggregate.count(PCollection<S> collect)
Returns a
PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollections (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollections (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollections (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollections (using the
same strategy as Pig's CROSS operator). |
static <T> PCollection<T> |
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements.
|
static <T> PCollection<T> |
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements.
|
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input)
Construct a new
PCollection that contains the unique elements of a
given input PCollection. |
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input,
int flushEvery)
A
distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
static <T> PCollection<T> |
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements.
|
static <T> PCollection<T> |
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements.
|
static <S> PObject<Long> |
Aggregate.length(PCollection<S> collect)
Returns the number of elements in the provided PCollection.
|
static <S> PObject<S> |
Aggregate.max(PCollection<S> collect)
Returns the largest numerical element from the input collection.
|
static <S> PObject<S> |
Aggregate.min(PCollection<S> collect)
Returns the smallest numerical element from the input collection.
|
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
double probability)
Output records from the given
PCollection with the given probability. |
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
long seed,
double probability)
Output records from the given
PCollection using a given seed. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection)
Sorts the
PCollection using the natural ordering of its elements. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the
PCollection using the natural ordering of its elements in
the order specified. |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pairs using the specified column
ordering. |
static <V1,V2,V3,V4> |
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple4s using the specified column
ordering. |
static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>> |
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple3s using the specified column
ordering. |
static PCollection<TupleN> |
Sort.sortTuples(PCollection<TupleN> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of TupleNs using the specified column
ordering. |
| Modifier and Type | Method and Description |
|---|---|
<T> PCollection<T> |
CrunchTool.read(Source<T> source) |
PCollection<String> |
CrunchTool.readTextFile(String pathName) |
| Modifier and Type | Method and Description |
|---|---|
void |
CrunchTool.write(PCollection<?> pcollection,
Target target) |
void |
CrunchTool.writeTextFile(PCollection<?> pcollection,
String pathName) |
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.