|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.crunch.io.From
public class From
Static factory methods for creating common Source types.
The From class is intended to provide a literate API for creating
Crunch pipelines from common input file types.
Pipeline pipeline = new MRPipeline(this.getClass());
// Reference the lines of a text file by wrapping the TextInputFormat class.
PCollection<String> lines = pipeline.read(From.textFile("/path/to/myfiles"));
// Reference entries from a sequence file where the key is a LongWritable and the
// value is a custom Writable class.
PTable<LongWritable, MyWritable> table = pipeline.read(From.sequenceFile(
"/path/to/seqfiles", LongWritable.class, MyWritable.class));
// Reference the records from an Avro file, where MyAvroObject implements Avro's
// SpecificRecord interface.
PCollection<MyAvroObject> myObjects = pipeline.read(From.avroFile("/path/to/avrofiles",
MyAvroObject.class));
// References the key-value pairs from a custom extension of FileInputFormat:
PTable<KeyWritable, ValueWritable> custom = pipeline.read(From.formattedFile(
"/custom", MyFileInputFormat.class, KeyWritable.class, ValueWritable.class));
| Constructor Summary | |
|---|---|
From()
|
|
| Method Summary | ||
|---|---|---|
static Source<org.apache.avro.generic.GenericData.Record> |
avroFile(List<org.apache.hadoop.fs.Path> paths)
Creates a Source<GenericData.Record> by reading the schema of the Avro file
at the given paths. |
|
static
|
avroFile(List<org.apache.hadoop.fs.Path> paths,
Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given Paths. |
|
static Source<org.apache.avro.generic.GenericData.Record> |
avroFile(List<org.apache.hadoop.fs.Path> paths,
org.apache.hadoop.conf.Configuration conf)
Creates a Source<GenericData.Record> by reading the schema of the Avro file
at the given paths using the FileSystem information contained in the given
Configuration instance. |
|
static
|
avroFile(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype)
Creates a Source<T> instance from the Avro file(s) at the given Paths. |
|
static Source<org.apache.avro.generic.GenericData.Record> |
avroFile(org.apache.hadoop.fs.Path path)
Creates a Source<GenericData.Record> by reading the schema of the Avro file
at the given path. |
|
static
|
avroFile(org.apache.hadoop.fs.Path path,
Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given Path. |
|
static Source<org.apache.avro.generic.GenericData.Record> |
avroFile(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Creates a Source<GenericData.Record> by reading the schema of the Avro file
at the given path using the FileSystem information contained in the given
Configuration instance. |
|
static
|
avroFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a Source<T> instance from the Avro file(s) at the given Path. |
|
static Source<org.apache.avro.generic.GenericData.Record> |
avroFile(String pathName)
Creates a Source<GenericData.Record> by reading the schema of the Avro file
at the given path. |
|
static
|
avroFile(String pathName,
Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given path name. |
|
static
|
avroFile(String pathName,
PType<T> ptype)
Creates a Source<T> instance from the Avro file(s) at the given path name. |
|
static
|
avroTableFile(List<org.apache.hadoop.fs.Path> paths,
PTableType<K,V> tableType)
Creates a TableSource<K,V> for reading an Avro key/value file at the given paths. |
|
static
|
avroTableFile(org.apache.hadoop.fs.Path path,
PTableType<K,V> tableType)
Creates a TableSource<K,V> for reading an Avro key/value file at the given path. |
|
static
|
formattedFile(List<org.apache.hadoop.fs.Path> paths,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods. |
|
static
|
formattedFile(List<org.apache.hadoop.fs.Path> paths,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods. |
|
static
|
formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods. |
|
static
|
formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods. |
|
static
|
formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods. |
|
static
|
formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods. |
|
static
|
sequenceFile(List<org.apache.hadoop.fs.Path> paths,
Class<K> keyClass,
Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths. |
|
static
|
sequenceFile(List<org.apache.hadoop.fs.Path> paths,
Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given Paths
from the value field of each key-value pair in the SequenceFile(s). |
|
static
|
sequenceFile(List<org.apache.hadoop.fs.Path> paths,
PType<K> keyType,
PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths. |
|
static
|
sequenceFile(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given Paths
from the value field of each key-value pair in the SequenceFile(s). |
|
static
|
sequenceFile(org.apache.hadoop.fs.Path path,
Class<K> keyClass,
Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path. |
|
static
|
sequenceFile(org.apache.hadoop.fs.Path path,
Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s). |
|
static
|
sequenceFile(org.apache.hadoop.fs.Path path,
PType<K> keyType,
PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path. |
|
static
|
sequenceFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s). |
|
static
|
sequenceFile(String pathName,
Class<K> keyClass,
Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name. |
|
static
|
sequenceFile(String pathName,
Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s). |
|
static
|
sequenceFile(String pathName,
PType<K> keyType,
PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name. |
|
static
|
sequenceFile(String pathName,
PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s). |
|
static Source<String> |
textFile(List<org.apache.hadoop.fs.Path> paths)
Creates a Source<String> instance for the text file(s) at the given Paths. |
|
static
|
textFile(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given Paths using
the provided PType<T> to convert the input text. |
|
static Source<String> |
textFile(org.apache.hadoop.fs.Path path)
Creates a Source<String> instance for the text file(s) at the given Path. |
|
static
|
textFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text. |
|
static Source<String> |
textFile(String pathName)
Creates a Source<String> instance for the text file(s) at the given path name. |
|
static
|
textFile(String pathName,
PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text. |
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public From()
| Method Detail |
|---|
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods.
pathName - The name of the path to the data on the filesystemformatClass - The FileInputFormat implementationkeyClass - The Writable to use for the keyvalueClass - The Writable to use for the value
TableSource<K, V> instance
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods.
path - The Path to the dataformatClass - The FileInputFormat implementationkeyClass - The Writable to use for the keyvalueClass - The Writable to use for the value
TableSource<K, V> instance
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(List<org.apache.hadoop.fs.Path> paths,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
Class<K> keyClass,
Class<V> valueClass)
TableSource<K, V> for reading data from files that have custom
FileInputFormat<K, V> implementations not covered by the provided TableSource
and Source factory methods.
paths - A list of Paths to the dataformatClass - The FileInputFormat implementationkeyClass - The Writable to use for the keyvalueClass - The Writable to use for the value
TableSource<K, V> instance
public static <K,V> TableSource<K,V> formattedFile(String pathName,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods.
pathName - The name of the path to the data on the filesystemformatClass - The FileInputFormat implementationkeyType - The PType to use for the keyvalueType - The PType to use for the value
TableSource<K, V> instance
public static <K,V> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods.
path - The Path to the dataformatClass - The FileInputFormat implementationkeyType - The PType to use for the keyvalueType - The PType to use for the value
TableSource<K, V> instance
public static <K,V> TableSource<K,V> formattedFile(List<org.apache.hadoop.fs.Path> paths,
Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
PType<K> keyType,
PType<V> valueType)
TableSource<K, V> for reading data from files that have custom
FileInputFormat implementations not covered by the provided TableSource
and Source factory methods.
paths - A list of Paths to the dataformatClass - The FileInputFormat implementationkeyType - The PType to use for the keyvalueType - The PType to use for the value
TableSource<K, V> instance
public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(String pathName,
Class<T> avroClass)
Source<T> instance from the Avro file(s) at the given path name.
pathName - The name of the path to the data on the filesystemavroClass - The subclass of SpecificRecord to use for the Avro file
Source<T> instance
public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(org.apache.hadoop.fs.Path path,
Class<T> avroClass)
Source<T> instance from the Avro file(s) at the given Path.
path - The Path to the dataavroClass - The subclass of SpecificRecord to use for the Avro file
Source<T> instance
public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(List<org.apache.hadoop.fs.Path> paths,
Class<T> avroClass)
Source<T> instance from the Avro file(s) at the given Paths.
paths - A list of Paths to the dataavroClass - The subclass of SpecificRecord to use for the Avro file
Source<T> instance
public static <T> Source<T> avroFile(String pathName,
PType<T> ptype)
Source<T> instance from the Avro file(s) at the given path name.
pathName - The name of the path to the data on the filesystemptype - The AvroType for the Avro records
Source<T> instance
public static <T> Source<T> avroFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Source<T> instance from the Avro file(s) at the given Path.
path - The Path to the dataptype - The AvroType for the Avro records
Source<T> instance
public static <T> Source<T> avroFile(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype)
Source<T> instance from the Avro file(s) at the given Paths.
paths - A list of Paths to the dataptype - The PType for the Avro records
Source<T> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
Source<GenericData.Record> by reading the schema of the Avro file
at the given path. If the path is a directory, the schema of a file in the directory
will be used to determine the schema to use.
pathName - The name of the path to the data on the filesystem
Source<GenericData.Record> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
Source<GenericData.Record> by reading the schema of the Avro file
at the given path. If the path is a directory, the schema of a file in the directory
will be used to determine the schema to use.
path - The path to the data on the filesystem
Source<GenericData.Record> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths)
Source<GenericData.Record> by reading the schema of the Avro file
at the given paths. If the path is a directory, the schema of a file in the directory
will be used to determine the schema to use.
paths - A list of paths to the data on the filesystem
Source<GenericData.Record> instance
public static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Source<GenericData.Record> by reading the schema of the Avro file
at the given path using the FileSystem information contained in the given
Configuration instance. If the path is a directory, the schema of a file in
the directory will be used to determine the schema to use.
path - The path to the data on the filesystemconf - The configuration information
Source<GenericData.Record> instance
public static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths,
org.apache.hadoop.conf.Configuration conf)
Source<GenericData.Record> by reading the schema of the Avro file
at the given paths using the FileSystem information contained in the given
Configuration instance. If the first path is a directory, the schema of a file in
the directory will be used to determine the schema to use.
paths - The path to the data on the filesystemconf - The configuration information
Source<GenericData.Record> instance
public static <K,V> TableSource<K,V> avroTableFile(org.apache.hadoop.fs.Path path,
PTableType<K,V> tableType)
TableSource<K,V> for reading an Avro key/value file at the given path.
path - The path to the data on the filesystemtableType - Avro table type for deserializing the table data
TableSource<K,V> instance for reading Avro key/value data
public static <K,V> TableSource<K,V> avroTableFile(List<org.apache.hadoop.fs.Path> paths,
PTableType<K,V> tableType)
TableSource<K,V> for reading an Avro key/value file at the given paths.
paths - list of paths to be read by the returned sourcetableType - Avro table type for deserializing the table data
TableSource<K,V> instance for reading Avro key/value data
public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(String pathName,
Class<T> valueClass)
Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
pathName - The name of the path to the data on the filesystemvalueClass - The Writable type for the value of the SequenceFile entry
Source<T> instance
public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(org.apache.hadoop.fs.Path path,
Class<T> valueClass)
Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
path - The Path to the datavalueClass - The Writable type for the value of the SequenceFile entry
Source<T> instance
public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
Class<T> valueClass)
Source<T> instance from the SequenceFile(s) at the given Paths
from the value field of each key-value pair in the SequenceFile(s).
paths - A list of Paths to the datavalueClass - The Writable type for the value of the SequenceFile entry
Source<T> instance
public static <T> Source<T> sequenceFile(String pathName,
PType<T> ptype)
Source<T> instance from the SequenceFile(s) at the given path name
from the value field of each key-value pair in the SequenceFile(s).
pathName - The name of the path to the data on the filesystemptype - The PType for the value of the SequenceFile entry
Source<T> instance
public static <T> Source<T> sequenceFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Source<T> instance from the SequenceFile(s) at the given Path
from the value field of each key-value pair in the SequenceFile(s).
path - The Path to the dataptype - The PType for the value of the SequenceFile entry
Source<T> instance
public static <T> Source<T> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype)
Source<T> instance from the SequenceFile(s) at the given Paths
from the value field of each key-value pair in the SequenceFile(s).
paths - A list of Paths to the dataptype - The PType for the value of the SequenceFile entry
Source<T> instance
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(String pathName,
Class<K> keyClass,
Class<V> valueClass)
TableSource<K, V> instance for the SequenceFile(s) at the given path name.
pathName - The name of the path to the data on the filesystemkeyClass - The Writable subclass for the key of the SequenceFile entryvalueClass - The Writable subclass for the value of the SequenceFile entry
SourceTable<K, V> instance
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
Class<K> keyClass,
Class<V> valueClass)
TableSource<K, V> instance for the SequenceFile(s) at the given Path.
path - The Path to the datakeyClass - The Writable subclass for the key of the SequenceFile entryvalueClass - The Writable subclass for the value of the SequenceFile entry
SourceTable<K, V> instance
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
Class<K> keyClass,
Class<V> valueClass)
TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
paths - A list of Paths to the datakeyClass - The Writable subclass for the key of the SequenceFile entryvalueClass - The Writable subclass for the value of the SequenceFile entry
SourceTable<K, V> instance
public static <K,V> TableSource<K,V> sequenceFile(String pathName,
PType<K> keyType,
PType<V> valueType)
TableSource<K, V> instance for the SequenceFile(s) at the given path name.
pathName - The name of the path to the data on the filesystemkeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entry
SourceTable<K, V> instance
public static <K,V> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
PType<K> keyType,
PType<V> valueType)
TableSource<K, V> instance for the SequenceFile(s) at the given Path.
path - The Path to the datakeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entry
SourceTable<K, V> instance
public static <K,V> TableSource<K,V> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
PType<K> keyType,
PType<V> valueType)
TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
paths - A list of Paths to the datakeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entry
SourceTable<K, V> instancepublic static Source<String> textFile(String pathName)
Source<String> instance for the text file(s) at the given path name.
pathName - The name of the path to the data on the filesystem
Source<String> instancepublic static Source<String> textFile(org.apache.hadoop.fs.Path path)
Source<String> instance for the text file(s) at the given Path.
path - The Path to the data
Source<String> instancepublic static Source<String> textFile(List<org.apache.hadoop.fs.Path> paths)
Source<String> instance for the text file(s) at the given Paths.
paths - A list of Paths to the data
Source<String> instance
public static <T> Source<T> textFile(String pathName,
PType<T> ptype)
Source<T> instance for the text file(s) at the given path name using
the provided PType<T> to convert the input text.
pathName - The name of the path to the data on the filesystemptype - The PType<T> to use to process the input text
Source<T> instance
public static <T> Source<T> textFile(org.apache.hadoop.fs.Path path,
PType<T> ptype)
Source<T> instance for the text file(s) at the given Path using
the provided PType<T> to convert the input text.
path - The Path to the dataptype - The PType<T> to use to process the input text
Source<T> instance
public static <T> Source<T> textFile(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype)
Source<T> instance for the text file(s) at the given Paths using
the provided PType<T> to convert the input text.
paths - A list of Paths to the dataptype - The PType<T> to use to process the input text
Source<T> instance
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||