public class Lambda extends Object
LCollection
, LTable
and
LGroupedTable
objects from their corresponding PCollection
, PTable
and PGroupedTable
types.
The crunch-lambda API allows you to write Crunch pipelines using lambda expressions and method references instead of creating classes (anonymous, inner, or top level) for each operation that needs to be completed. Many pipelines are composed of a large number of simple operations, rather than a small number of complex operations, making this strategy much more efficient to code and easy to read for those able to use Java 8 in their distributed computation environments.
You use the API by wrapping your Crunch type into an L-type object. This class provides static methods for that. You can then use the lambda API methods on the L-type object, yielding more L-type objects. If at any point you need to go back to the standard Crunch world (for compatibility with existing code or complex use cases), you can at any time call underlying() on an L-type object to get a Crunch object
Example (the obligatory wordcount):
Pipeline pipeline = new MRPipeline(getClass());
LCollection<String> inputText = Lambda.wrap(pipeline.readTextFile("/path/to/input/file"));
inputText.flatMap(line -> Arrays.stream(line.split(" ")), Writables.strings())
.count()
.map(wordCountPair -> wordCountPair.first() + ": " + wordCountPair.second(), strings())
.write(To.textFile("/path/to/output/file"));
pipeline.run();
Constructor and Description |
---|
Lambda() |
Modifier and Type | Method and Description |
---|---|
static <S> LCollection<S> |
wrap(PCollection<S> collection) |
static <K,V> LGroupedTable<K,V> |
wrap(PGroupedTable<K,V> collection) |
static <K,V> LTable<K,V> |
wrap(PTable<K,V> collection) |
public static <S> LCollection<S> wrap(PCollection<S> collection)
public static <K,V> LGroupedTable<K,V> wrap(PGroupedTable<K,V> collection)
Copyright © 2016 The Apache Software Foundation. All rights reserved.