public class To extends Object
Static factory methods for creating common Target
types.
The To
class is intended to be used as part of a literate API
for writing the output of Crunch pipelines to common file types. We can use
the Target
objects created by the factory methods in the To
class with either the write
method on the Pipeline
class or
the convenience write
method on PCollection
and PTable
instances.
Pipeline pipeline = new MRPipeline(this.getClass());
...
// Write a PCollection<String> to a text file:
PCollection<String> words = ...;
pipeline.write(words, To.textFile("/put/my/words/here"));
// Write a PTable<Text, Text> to a sequence file:
PTable<Text, Text> textToText = ...;
textToText.write(To.sequenceFile("/words/to/words"));
// Write a PCollection<MyAvroObject> to an Avro data file:
PCollection<MyAvroObject> objects = ...;
objects.write(To.avroFile("/my/avro/files"));
// Write a PTable to a custom FileOutputFormat:
PTable<KeyWritable, ValueWritable> custom = ...;
pipeline.write(custom, To.formattedFile("/custom", MyFileFormat.class));
Constructor and Description |
---|
To() |
Modifier and Type | Method and Description |
---|---|
static Target |
avroFile(org.apache.hadoop.fs.Path path)
Creates a
Target at the given Path that writes data to
Avro files. |
static Target |
avroFile(String pathName)
Creates a
Target at the given path name that writes data to
Avro files. |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>> formatClass)
Creates a
Target at the given Path that writes data to
a custom FileOutputFormat . |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>> formatClass)
Creates a
Target at the given path name that writes data to
a custom FileOutputFormat . |
static Target |
sequenceFile(org.apache.hadoop.fs.Path path)
Creates a
Target at the given Path that writes data to
SequenceFiles. |
static Target |
sequenceFile(String pathName)
Creates a
Target at the given path name that writes data to
SequenceFiles. |
static Target |
textFile(org.apache.hadoop.fs.Path path)
Creates a
Target at the given Path that writes data to
text files. |
static Target |
textFile(String pathName)
Creates a
Target at the given path name that writes data to
text files. |
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> Target formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>> formatClass)
Target
at the given path name that writes data to
a custom FileOutputFormat
.pathName
- The name of the path to write the data to on the filesystemformatClass
- The FileOutputFormat<K, V>
to write the data toTarget
instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> Target formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>> formatClass)
Target
at the given Path
that writes data to
a custom FileOutputFormat
.path
- The Path
to write the data toformatClass
- The FileOutputFormat
to write the data toTarget
instancepublic static Target avroFile(String pathName)
Target
at the given path name that writes data to
Avro files. The PType
for the written data must be for Avro records.pathName
- The name of the path to write the data to on the filesystemTarget
instancepublic static Target avroFile(org.apache.hadoop.fs.Path path)
Target
at the given Path
that writes data to
Avro files. The PType
for the written data must be for Avro records.path
- The Path
to write the data toTarget
instancepublic static Target sequenceFile(String pathName)
Target
at the given path name that writes data to
SequenceFiles.pathName
- The name of the path to write the data to on the filesystemTarget
instancepublic static Target sequenceFile(org.apache.hadoop.fs.Path path)
Target
at the given Path
that writes data to
SequenceFiles.path
- The Path
to write the data toTarget
instancepublic static Target textFile(String pathName)
Target
at the given path name that writes data to
text files.pathName
- The name of the path to write the data to on the filesystemTarget
instancepublic static Target textFile(org.apache.hadoop.fs.Path path)
Target
at the given Path
that writes data to
text files.path
- The Path
to write the data toTarget
instanceCopyright © 2016 The Apache Software Foundation. All rights reserved.