public class At extends Object
Static factory methods for creating common SourceTarget types, which may be treated as both a Source
and a Target.
The At methods is analogous to the From and To factory methods, but is used for
storing intermediate outputs that need to be passed from one run of a MapReduce pipeline to another run. The
SourceTarget object acts as both a Source and a , which enables it to provide this
functionality.
Pipeline pipeline = new MRPipeline(this.getClass());
// Create our intermediate storage location
SourceTarget<String> intermediate = At.textFile("/temptext");
...
// Write out the output of the first phase of a pipeline.
pipeline.write(phase1, intermediate);
// Explicitly call run to kick off the pipeline.
pipeline.run();
// And then kick off a second phase by consuming the output
// from the first phase.
PCollection<String> phase2Input = pipeline.read(intermediate);
...
The SourceTarget abstraction is useful when we care about reading the intermediate
outputs of a pipeline as well as the final results.
| Constructor and Description |
|---|
At() |
| Modifier and Type | Method and Description |
|---|---|
static SourceTarget<org.apache.avro.generic.GenericData.Record> |
avroFile(org.apache.hadoop.fs.Path path)
Creates a
SourceTarget<GenericData.Record> by reading the schema of the Avro file
at the given path. |
static <T extends org.apache.avro.specific.SpecificRecord> |
avroFile(org.apache.hadoop.fs.Path path,
Class<T> avroClass)
Creates a
SourceTarget<T> instance from the Avro file(s) at the given Path. |
static SourceTarget<org.apache.avro.generic.GenericData.Record> |
avroFile(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Creates a
SourceTarget<GenericData.Record> by reading the schema of the Avro file
at the given path using the FileSystem information contained in the given
Configuration instance. |
static <T> SourceTarget<T> |
avroFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a
SourceTarget<T> instance from the Avro file(s) at the given Path. |
static SourceTarget<org.apache.avro.generic.GenericData.Record> |
avroFile(String pathName)
Creates a
SourceTarget<GenericData.Record> by reading the schema of the Avro file
at the given path. |
static <T extends org.apache.avro.specific.SpecificRecord> |
avroFile(String pathName,
Class<T> avroClass)
Creates a
SourceTarget<T> instance from the Avro file(s) at the given path name. |
static <T> SourceTarget<T> |
avroFile(String pathName,
PType<T> ptype)
Creates a
SourceTarget<T> instance from the Avro file(s) at the given path name. |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
sequenceFile(org.apache.hadoop.fs.Path path,
Class<K> keyClass,
Class<V> valueClass)
Creates a
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s). |
static <T extends org.apache.hadoop.io.Writable> |
sequenceFile(org.apache.hadoop.fs.Path path,
Class<T> valueClass)
Creates a
SourceTarget<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s). |
static <K,V> TableSourceTarget<K,V> |
sequenceFile(org.apache.hadoop.fs.Path path,
PType<K> keyType,
PType<V> valueType)
Creates a
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s). |
static <T> SourceTarget<T> |
sequenceFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a
SourceTarget<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s). |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
sequenceFile(String pathName,
Class<K> keyClass,
Class<V> valueClass)
Creates a
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s). |
static <T extends org.apache.hadoop.io.Writable> |
sequenceFile(String pathName,
Class<T> valueClass)
Creates a
SourceTarget<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s). |
static <K,V> TableSourceTarget<K,V> |
sequenceFile(String pathName,
PType<K> keyType,
PType<V> valueType)
Creates a
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s). |
static <T> SourceTarget<T> |
sequenceFile(String pathName,
PType<T> ptype)
Creates a
SourceTarget<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s). |
static SourceTarget<String> |
textFile(org.apache.hadoop.fs.Path path)
Creates a
SourceTarget<String> instance for the text file(s) at the given Path. |
static <T> SourceTarget<T> |
textFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a
SourceTarget<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text. |
static SourceTarget<String> |
textFile(String pathName)
Creates a
SourceTarget<String> instance for the text file(s) at the given path name. |
static <T> SourceTarget<T> |
textFile(String pathName,
PType<T> ptype)
Creates a
SourceTarget<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text. |
public static <T extends org.apache.avro.specific.SpecificRecord> SourceTarget<T> avroFile(String pathName, Class<T> avroClass)
SourceTarget<T> instance from the Avro file(s) at the given path name.pathName - The name of the path to the data on the filesystemavroClass - The subclass of SpecificRecord to use for the Avro fileSourceTarget<T> instancepublic static <T extends org.apache.avro.specific.SpecificRecord> SourceTarget<T> avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
SourceTarget<T> instance from the Avro file(s) at the given Path.path - The Path to the dataavroClass - The subclass of SpecificRecord to use for the Avro fileSourceTarget<T> instancepublic static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
SourceTarget<GenericData.Record> by reading the schema of the Avro file
at the given path. If the path is a directory, the schema of a file in the directory
will be used to determine the schema to use.pathName - The name of the path to the data on the filesystemSourceTarget<GenericData.Record> instancepublic static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
SourceTarget<GenericData.Record> by reading the schema of the Avro file
at the given path. If the path is a directory, the schema of a file in the directory
will be used to determine the schema to use.path - The path to the data on the filesystemSourceTarget<GenericData.Record> instancepublic static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
SourceTarget<GenericData.Record> by reading the schema of the Avro file
at the given path using the FileSystem information contained in the given
Configuration instance. If the path is a directory, the schema of a file in
the directory will be used to determine the schema to use.path - The path to the data on the filesystemconf - The configuration informationSourceTarget<GenericData.Record> instancepublic static <T> SourceTarget<T> avroFile(String pathName, PType<T> ptype)
SourceTarget<T> instance from the Avro file(s) at the given path name.pathName - The name of the path to the data on the filesystemptype - The PType for the Avro recordsSourceTarget<T> instancepublic static <T> SourceTarget<T> avroFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
SourceTarget<T> instance from the Avro file(s) at the given Path.path - The Path to the dataptype - The PType for the Avro recordsSourceTarget<T> instancepublic static <T extends org.apache.hadoop.io.Writable> SourceTarget<T> sequenceFile(String pathName, Class<T> valueClass)
SourceTarget<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).pathName - The name of the path to the data on the filesystemvalueClass - The Writable type for the value of the SequenceFile entrySourceTarget<T> instancepublic static <T extends org.apache.hadoop.io.Writable> SourceTarget<T> sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
SourceTarget<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).path - The Path to the datavalueClass - The Writable type for the value of the SequenceFile entrySourceTarget<T> instancepublic static <T> SourceTarget<T> sequenceFile(String pathName, PType<T> ptype)
SourceTarget<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).pathName - The name of the path to the data on the filesystemptype - The PType for the value of the SequenceFile entrySourceTarget<T> instancepublic static <T> SourceTarget<T> sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
SourceTarget<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).path - The Path to the dataptype - The PType for the value of the SequenceFile entrySourceTarget<T> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSourceTarget<K,V> sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s).pathName - The name of the path to the data on the filesystemkeyClass - The Writable type for the key of the SequenceFile entryvalueClass - The Writable type for the value of the SequenceFile entryTableSourceTarget<K, V> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSourceTarget<K,V> sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s).path - The Path to the datakeyClass - The Writable type for the key of the SequenceFile entryvalueClass - The Writable type for the value of the SequenceFile entryTableSourceTarget<K, V> instancepublic static <K,V> TableSourceTarget<K,V> sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name
from the key-value pairs in the SequenceFile(s).pathName - The name of the path to the data on the filesystemkeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entryTableSourceTarget<K, V> instancepublic static <K,V> TableSourceTarget<K,V> sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path
from the key-value pairs in the SequenceFile(s).path - The Path to the datakeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entryTableSourceTarget<K, V> instancepublic static SourceTarget<String> textFile(String pathName)
SourceTarget<String> instance for the text file(s) at the given path name.pathName - The name of the path to the data on the filesystemSourceTarget<String> instancepublic static SourceTarget<String> textFile(org.apache.hadoop.fs.Path path)
SourceTarget<String> instance for the text file(s) at the given Path.path - The Path to the dataSourceTarget<String> instancepublic static <T> SourceTarget<T> textFile(String pathName, PType<T> ptype)
SourceTarget<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text.pathName - The name of the path to the data on the filesystemptype - The PType<T> to use to process the input textSourceTarget<T> instancepublic static <T> SourceTarget<T> textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
SourceTarget<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text.path - The Path to the dataptype - The PType<T> to use to process the input textSourceTarget<T> instanceCopyright © 2016 The Apache Software Foundation. All rights reserved.