public class From extends Object
Static factory methods for creating common Source
types.
The From
class is intended to provide a literate API for creating
Crunch pipelines from common input file types.
Pipeline pipeline = new MRPipeline(this.getClass());
// Reference the lines of a text file by wrapping the TextInputFormat class.
PCollection
Constructor and Description |
---|
From() |
Modifier and Type | Method and Description |
---|---|
static <T> Source<T> |
avroFile(org.apache.hadoop.fs.Path path,
AvroType<T> avroType)
Creates a
Source<T> instance from the Avro file(s) at the given Path . |
static <T extends org.apache.avro.specific.SpecificRecord> |
avroFile(org.apache.hadoop.fs.Path path,
Class<T> avroClass)
Creates a
Source<T> instance from the Avro file(s) at the given Path . |
static <T> Source<T> |
avroFile(String pathName,
AvroType<T> avroType)
Creates a
Source<T> instance from the Avro file(s) at the given path name. |
static <T extends org.apache.avro.specific.SpecificRecord> |
avroFile(String pathName,
Class<T> avroClass)
Creates a
Source<T> instance from the Avro file(s) at the given path name. |
static <K,V> TableSource<K,V> |
formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
Creates a
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods. |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
Creates a
TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods. |
static <K,V> TableSource<K,V> |
formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
Creates a
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods. |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
Creates a
TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods. |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
sequenceFile(org.apache.hadoop.fs.Path path,
Class<K> keyClass,
Class<V> valueClass)
Creates a
TableSource<K, V> instance for the SequenceFile(s) at the given Path . |
static <T extends org.apache.hadoop.io.Writable> |
sequenceFile(org.apache.hadoop.fs.Path path,
Class<T> valueClass)
Creates a
Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s). |
static <K,V> TableSource<K,V> |
sequenceFile(org.apache.hadoop.fs.Path path,
PType<K> keyType,
PType<V> valueType)
Creates a
TableSource<K, V> instance for the SequenceFile(s) at the given Path . |
static <T> Source<T> |
sequenceFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a
Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s). |
static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> |
sequenceFile(String pathName,
Class<K> keyClass,
Class<V> valueClass)
Creates a
TableSource<K, V> instance for the SequenceFile(s) at the given path name. |
static <T extends org.apache.hadoop.io.Writable> |
sequenceFile(String pathName,
Class<T> valueClass)
Creates a
Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s). |
static <K,V> TableSource<K,V> |
sequenceFile(String pathName,
PType<K> keyType,
PType<V> valueType)
Creates a
TableSource<K, V> instance for the SequenceFile(s) at the given path name. |
static <T> Source<T> |
sequenceFile(String pathName,
PType<T> ptype)
Creates a
Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s). |
static Source<String> |
textFile(org.apache.hadoop.fs.Path path)
Creates a
Source<String> instance for the text file(s) at the given Path . |
static <T> Source<T> |
textFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a
Source<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text. |
static Source<String> |
textFile(String pathName)
Creates a
Source<String> instance for the text file(s) at the given path name. |
static <T> Source<T> |
textFile(String pathName,
PType<T> ptype)
Creates a
Source<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text. |
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V>
for reading data from files that have custom
FileInputFormat<K, V>
implementations not covered by the provided TableSource
and Source
factory methods.pathName
- The name of the path to the data on the filesystemformatClass
- The FileInputFormat
implementationkeyClass
- The Writable
to use for the keyvalueClass
- The Writable
to use for the valueTableSource<K, V>
instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V>
for reading data from files that have custom
FileInputFormat<K, V>
implementations not covered by the provided TableSource
and Source
factory methods.The
- Path
to the dataformatClass
- The FileInputFormat
implementationkeyClass
- The Writable
to use for the keyvalueClass
- The Writable
to use for the valueTableSource<K, V>
instancepublic static <K,V> TableSource<K,V> formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
TableSource<K, V>
for reading data from files that have custom
FileInputFormat
implementations not covered by the provided TableSource
and Source
factory methods.pathName
- The name of the path to the data on the filesystemformatClass
- The FileInputFormat
implementationkeyType
- The PType
to use for the keyvalueType
- The PType
to use for the valueTableSource<K, V>
instancepublic static <K,V> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
TableSource<K, V>
for reading data from files that have custom
FileInputFormat
implementations not covered by the provided TableSource
and Source
factory methods.The
- Path
to the dataformatClass
- The FileInputFormat
implementationkeyType
- The PType
to use for the keyvalueType
- The PType
to use for the valueTableSource<K, V>
instancepublic static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(String pathName, Class<T> avroClass)
Source<T>
instance from the Avro file(s) at the given path name.pathName
- The name of the path to the data on the filesystemavroClass
- The subclass of SpecificRecord
to use for the Avro fileSource<T>
instancepublic static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
Source<T>
instance from the Avro file(s) at the given Path
.path
- The Path
to the dataavroClass
- The subclass of SpecificRecord
to use for the Avro fileSource<T>
instancepublic static <T> Source<T> avroFile(String pathName, AvroType<T> avroType)
Source<T>
instance from the Avro file(s) at the given path name.pathName
- The name of the path to the data on the filesystemavroType
- The AvroType
for the Avro recordsSource<T>
instancepublic static <T> Source<T> avroFile(org.apache.hadoop.fs.Path path, AvroType<T> avroType)
Source<T>
instance from the Avro file(s) at the given Path
.path
- The Path
to the dataavroType
- The AvroType
for the Avro recordsSource<T>
instancepublic static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(String pathName, Class<T> valueClass)
Source<T>
instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).pathName
- The name of the path to the data on the filesystemvalueClass
- The Writable
type for the value of the SequenceFile entrySource<T>
instancepublic static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
Source<T>
instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).path
- The Path
to the datavalueClass
- The Writable
type for the value of the SequenceFile entrySource<T>
instancepublic static <T> Source<T> sequenceFile(String pathName, PType<T> ptype)
Source<T>
instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).pathName
- The name of the path to the data on the filesystemptype
- The PType
for the value of the SequenceFile entrySource<T>
instancepublic static <T> Source<T> sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
Source<T>
instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).path
- The Path
to the dataptype
- The PType
for the value of the SequenceFile entrySource<T>
instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V>
instance for the SequenceFile(s) at the given path name.pathName
- The name of the path to the data on the filesystemkeyClass
- The Writable
subclass for the key of the SequenceFile entryvalueClass
- The Writable
subclass for the value of the SequenceFile entrySourceTable<K, V>
instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V>
instance for the SequenceFile(s) at the given Path
.path
- The Path
to the datakeyClass
- The Writable
subclass for the key of the SequenceFile entryvalueClass
- The Writable
subclass for the value of the SequenceFile entrySourceTable<K, V>
instancepublic static <K,V> TableSource<K,V> sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
TableSource<K, V>
instance for the SequenceFile(s) at the given path name.pathName
- The name of the path to the data on the filesystemkeyType
- The PType
for the key of the SequenceFile entryvalueType
- The PType
for the value of the SequenceFile entrySourceTable<K, V>
instancepublic static <K,V> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
TableSource<K, V>
instance for the SequenceFile(s) at the given Path
.path
- The Path
to the datakeyType
- The PType
for the key of the SequenceFile entryvalueType
- The PType
for the value of the SequenceFile entrySourceTable<K, V>
instancepublic static Source<String> textFile(String pathName)
Source<String>
instance for the text file(s) at the given path name.pathName
- The name of the path to the data on the filesystemSource<String>
instancepublic static Source<String> textFile(org.apache.hadoop.fs.Path path)
Source<String>
instance for the text file(s) at the given Path
.path
- The Path
to the dataSource<String>
instancepublic static <T> Source<T> textFile(String pathName, PType<T> ptype)
Source<T>
instance for the text file(s) at the given path name using
the provided PType<T>
to convert the input text.pathName
- The name of the path to the data on the filesystemptype
- The PType<T>
to use to process the input textSource<T>
instancepublic static <T> Source<T> textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
Source<T>
instance for the text file(s) at the given Path
using
the provided PType<T>
to convert the input text.path
- The Path
to the dataptype
- The PType<T>
to use to process the input textSource<T>
instanceCopyright © 2013 The Apache Software Foundation. All Rights Reserved.