Package | Description |
---|---|
org.apache.crunch |
Client-facing API and core abstractions.
|
org.apache.crunch.contrib.bloomfilter |
Support for creating Bloom Filters.
|
org.apache.crunch.contrib.text | |
org.apache.crunch.examples |
Example applications demonstrating various aspects of Crunch.
|
org.apache.crunch.impl.mem |
In-memory Pipeline implementation for rapid prototyping and testing.
|
org.apache.crunch.impl.mr |
A Pipeline implementation that runs on Hadoop MapReduce.
|
org.apache.crunch.lib |
Joining, sorting, aggregating, and other commonly used functionality.
|
org.apache.crunch.util |
An assorted set of utilities.
|
Modifier and Type | Interface and Description |
---|---|
interface |
PGroupedTable<K,V>
The Crunch representation of a grouped
PTable . |
interface |
PTable<K,V>
A sub-interface of
PCollection that represents an immutable,
distributed multi-map of keys and values. |
Modifier and Type | Method and Description |
---|---|
PCollection<S> |
PCollection.filter(FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection . |
PCollection<S> |
PCollection.filter(String name,
FilterFn<S> filterFn)
Apply the given filter function to this instance and return the resulting
PCollection . |
PCollection<K> |
PTable.keys()
Returns a
PCollection made up of the keys in this PTable. |
<T> PCollection<T> |
PCollection.parallelDo(DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
PCollection.parallelDo(String name,
DoFn<S,T> doFn,
PType<T> type,
ParallelDoOptions options)
Applies the given doFn to the elements of this
PCollection and
returns a new PCollection that is the output of this processing. |
<T> PCollection<T> |
Pipeline.read(Source<T> source)
Converts the given
Source into a PCollection that is
available to jobs run using this Pipeline instance. |
PCollection<String> |
Pipeline.readTextFile(String pathName)
A convenience method for reading a text file.
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a
PCollection instance that acts as the union of this
PCollection and the input PCollection s. |
PCollection<V> |
PTable.values()
Returns a
PCollection made up of the values in this PTable. |
PCollection<S> |
PCollection.write(Target target)
Write the contents of this
PCollection to the given Target ,
using the storage format specified by the target. |
PCollection<S> |
PCollection.write(Target target,
Target.WriteMode writeMode)
Write the contents of this
PCollection to the given Target ,
using the given Target.WriteMode to handle existing
targets. |
Modifier and Type | Method and Description |
---|---|
<T> Iterable<T> |
Pipeline.materialize(PCollection<T> pcollection)
Create the given PCollection and read the data it contains into the
returned Collection instance for client use.
|
PCollection<S> |
PCollection.union(PCollection<S>... collections)
Returns a
PCollection instance that acts as the union of this
PCollection and the input PCollection s. |
void |
Pipeline.write(PCollection<?> collection,
Target target)
Write the given collection to the given target on the next pipeline run.
|
void |
Pipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode)
Write the contents of the
PCollection to the given Target ,
using the storage format specified by the target and the given
WriteMode for cases where the referenced Target
already exists. |
<T> void |
Pipeline.writeTextFile(PCollection<T> collection,
String pathName)
A convenience method for writing a text file.
|
Modifier and Type | Method and Description |
---|---|
static <T> PObject<org.apache.hadoop.util.bloom.BloomFilter> |
BloomFilterFactory.createFilter(PCollection<T> collection,
BloomFilterFn<T> filterFn) |
Modifier and Type | Method and Description |
---|---|
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> . |
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily . |
Modifier and Type | Method and Description |
---|---|
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> . |
static <T> PCollection<T> |
Parse.parse(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<T> extractor)
Parses the lines of the input
PCollection<String> and returns a PCollection<T> using
the given Extractor<T> that uses the given PTypeFamily . |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> . |
static <K,V> PTable<K,V> |
Parse.parseTable(String groupName,
PCollection<String> input,
PTypeFamily ptf,
Extractor<Pair<K,V>> extractor)
Parses the lines of the input
PCollection<String> and returns a PTable<K, V> using
the given Extractor<Pair<K, V>> that uses the given PTypeFamily . |
Modifier and Type | Method and Description |
---|---|
PCollection<org.apache.hadoop.hbase.client.Put> |
WordAggregationHBase.createPut(PTable<String,String> extractedText)
Create puts in order to insert them in hbase.
|
Modifier and Type | Method and Description |
---|---|
static <T> PCollection<T> |
MemPipeline.collectionOf(Iterable<T> collect) |
static <T> PCollection<T> |
MemPipeline.collectionOf(T... ts) |
<T> PCollection<T> |
MemPipeline.read(Source<T> source) |
PCollection<String> |
MemPipeline.readTextFile(String pathName) |
static <T> PCollection<T> |
MemPipeline.typedCollectionOf(PType<T> ptype,
Iterable<T> collect) |
static <T> PCollection<T> |
MemPipeline.typedCollectionOf(PType<T> ptype,
T... ts) |
Modifier and Type | Method and Description |
---|---|
<T> Iterable<T> |
MemPipeline.materialize(PCollection<T> pcollection) |
void |
MemPipeline.write(PCollection<?> collection,
Target target) |
void |
MemPipeline.write(PCollection<?> collection,
Target target,
Target.WriteMode writeMode) |
<T> void |
MemPipeline.writeTextFile(PCollection<T> collection,
String pathName) |
Modifier and Type | Method and Description |
---|---|
<S> PCollection<S> |
MRPipeline.read(Source<S> source) |
PCollection<String> |
MRPipeline.readTextFile(String pathName) |
Modifier and Type | Method and Description |
---|---|
<T> ReadableSource<T> |
MRPipeline.getMaterializeSourceTarget(PCollection<T> pcollection)
Retrieve a ReadableSourceTarget that provides access to the contents of a
PCollection . |
<T> Iterable<T> |
MRPipeline.materialize(PCollection<T> pcollection) |
void |
MRPipeline.write(PCollection<?> pcollection,
Target target) |
void |
MRPipeline.write(PCollection<?> pcollection,
Target target,
Target.WriteMode writeMode) |
<T> void |
MRPipeline.writeTextFile(PCollection<T> pcollection,
String pathName) |
Modifier and Type | Method and Description |
---|---|
static <T> PCollection<Tuple3<T,T,T>> |
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix
comm utility. |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <T> PCollection<T> |
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements.
|
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input)
Construct a new
PCollection that contains the unique elements of a
given input PCollection . |
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input,
int flushEvery)
A
distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
static <T> PCollection<T> |
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements.
|
static <K,V> PCollection<K> |
PTables.keys(PTable<K,V> ptable)
Extract the keys from the given
PTable<K, V> as a PCollection<K> . |
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
double probability)
Output records from the given
PCollection with the given probability. |
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
long seed,
double probability)
Output records from the given
PCollection using a given seed. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection)
Sorts the
PCollection using the natural ordering of its elements. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the
PCollection using the natural ordering of its elements in
the order specified. |
static <K,V1,V2,T> |
SecondarySort.sortAndApply(PTable<K,Pair<V1,V2>> input,
DoFn<Pair<K,Iterable<Pair<V1,V2>>>,T> doFn,
PType<T> ptype)
Perform a secondary sort on the given
PTable instance and then apply a
DoFn to the resulting sorted data to yield an output PCollection<T> . |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pair s using the specified column
ordering. |
static <V1,V2,V3,V4> |
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple4 s using the specified column
ordering. |
static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>> |
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple3 s using the specified column
ordering. |
static PCollection<TupleN> |
Sort.sortTuples(PCollection<TupleN> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of TupleN s using the specified column
ordering. |
static <K,V> PCollection<V> |
PTables.values(PTable<K,V> ptable)
Extract the values from the given
PTable<K, V> as a PCollection<V> . |
Modifier and Type | Method and Description |
---|---|
static <K,V> PTable<K,V> |
PTables.asPTable(PCollection<Pair<K,V>> pcollect)
Convert the given
PCollection<Pair<K, V>> to a PTable<K, V> . |
static <T> PCollection<Tuple3<T,T,T>> |
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix
comm utility. |
static <T> PCollection<Tuple3<T,T,T>> |
Set.comm(PCollection<T> coll1,
PCollection<T> coll2)
Find the elements that are common to two sets, like the Unix
comm utility. |
static <S> PTable<S,Long> |
Aggregate.count(PCollection<S> collect)
Returns a
PTable that contains the unique elements of this collection mapped to a count
of their occurrences. |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <U,V> PCollection<Pair<U,V>> |
Cartesian.cross(PCollection<U> left,
PCollection<V> right,
int parallelism)
Performs a full cross join on the specified
PCollection s (using the
same strategy as Pig's CROSS operator). |
static <T> PCollection<T> |
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements.
|
static <T> PCollection<T> |
Set.difference(PCollection<T> coll1,
PCollection<T> coll2)
Compute the set difference between two sets of elements.
|
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input)
Construct a new
PCollection that contains the unique elements of a
given input PCollection . |
static <S> PCollection<S> |
Distinct.distinct(PCollection<S> input,
int flushEvery)
A
distinct operation that gives the client more control over how frequently
elements are flushed to disk in order to allow control over performance or
memory consumption. |
static <T> PCollection<T> |
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements.
|
static <T> PCollection<T> |
Set.intersection(PCollection<T> coll1,
PCollection<T> coll2)
Compute the intersection of two sets of elements.
|
static <S> PObject<Long> |
Aggregate.length(PCollection<S> collect)
Returns the number of elements in the provided PCollection.
|
static <S> PObject<S> |
Aggregate.max(PCollection<S> collect)
Returns the largest numerical element from the input collection.
|
static <S> PObject<S> |
Aggregate.min(PCollection<S> collect)
Returns the smallest numerical element from the input collection.
|
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
double probability)
Output records from the given
PCollection with the given probability. |
static <S> PCollection<S> |
Sample.sample(PCollection<S> input,
long seed,
double probability)
Output records from the given
PCollection using a given seed. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection)
Sorts the
PCollection using the natural ordering of its elements. |
static <T> PCollection<T> |
Sort.sort(PCollection<T> collection,
Sort.Order order)
Sorts the
PCollection using the natural ordering of its elements in
the order specified. |
static <U,V> PCollection<Pair<U,V>> |
Sort.sortPairs(PCollection<Pair<U,V>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Pair s using the specified column
ordering. |
static <V1,V2,V3,V4> |
Sort.sortQuads(PCollection<Tuple4<V1,V2,V3,V4>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple4 s using the specified column
ordering. |
static <V1,V2,V3> PCollection<Tuple3<V1,V2,V3>> |
Sort.sortTriples(PCollection<Tuple3<V1,V2,V3>> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of Tuple3 s using the specified column
ordering. |
static PCollection<TupleN> |
Sort.sortTuples(PCollection<TupleN> collection,
Sort.ColumnOrder... columnOrders)
Sorts the
PCollection of TupleN s using the specified column
ordering. |
Modifier and Type | Method and Description |
---|---|
<T> PCollection<T> |
CrunchTool.read(Source<T> source) |
PCollection<String> |
CrunchTool.readTextFile(String pathName) |
Modifier and Type | Method and Description |
---|---|
void |
CrunchTool.write(PCollection<?> pcollection,
Target target) |
void |
CrunchTool.writeTextFile(PCollection<?> pcollection,
String pathName) |
Copyright © 2013 The Apache Software Foundation. All Rights Reserved.