public class From extends Object
Static factory methods for creating common Source types.
The From class is intended to provide a literate API for creating
 Crunch pipelines from common input file types.
 
 
   Pipeline pipeline = new MRPipeline(this.getClass());
   
   // Reference the lines of a text file by wrapping the TextInputFormat class.
   PCollection<String> lines = pipeline.read(From.textFile("/path/to/myfiles"));
   
   // Reference entries from a sequence file where the key is a LongWritable and the
   // value is a custom Writable class.
   PTable<LongWritable, MyWritable> table = pipeline.read(From.sequenceFile(
       "/path/to/seqfiles", LongWritable.class, MyWritable.class));
   
   // Reference the records from an Avro file, where MyAvroObject implements Avro's
   // SpecificRecord interface.
   PCollection<MyAvroObject> myObjects = pipeline.read(From.avroFile("/path/to/avrofiles",
       MyAvroObject.class));
       
   // References the key-value pairs from a custom extension of FileInputFormat:
   PTable<KeyWritable, ValueWritable> custom = pipeline.read(From.formattedFile(
       "/custom", MyFileInputFormat.class, KeyWritable.class, ValueWritable.class));
 
 | Constructor and Description | 
|---|
| From() | 
| Modifier and Type | Method and Description | 
|---|---|
| static Source<org.apache.avro.generic.GenericData.Record> | avroFile(List<org.apache.hadoop.fs.Path> paths)Creates a  Source<GenericData.Record>by reading the schema of the Avro file
 at the given paths. | 
| static <T extends org.apache.avro.specific.SpecificRecord> | avroFile(List<org.apache.hadoop.fs.Path> paths,
        Class<T> avroClass)Creates a  Source<T>instance from the Avro file(s) at the givenPaths. | 
| static Source<org.apache.avro.generic.GenericData.Record> | avroFile(List<org.apache.hadoop.fs.Path> paths,
        org.apache.hadoop.conf.Configuration conf)Creates a  Source<GenericData.Record>by reading the schema of the Avro file
 at the given paths using theFileSysteminformation contained in the givenConfigurationinstance. | 
| static <T> Source<T> | avroFile(List<org.apache.hadoop.fs.Path> paths,
        PType<T> ptype)Creates a  Source<T>instance from the Avro file(s) at the givenPaths. | 
| static Source<org.apache.avro.generic.GenericData.Record> | avroFile(org.apache.hadoop.fs.Path path)Creates a  Source<GenericData.Record>by reading the schema of the Avro file
 at the given path. | 
| static <T extends org.apache.avro.specific.SpecificRecord> | avroFile(org.apache.hadoop.fs.Path path,
        Class<T> avroClass)Creates a  Source<T>instance from the Avro file(s) at the givenPath. | 
| static Source<org.apache.avro.generic.GenericData.Record> | avroFile(org.apache.hadoop.fs.Path path,
        org.apache.hadoop.conf.Configuration conf)Creates a  Source<GenericData.Record>by reading the schema of the Avro file
 at the given path using theFileSysteminformation contained in the givenConfigurationinstance. | 
| static <T> Source<T> | avroFile(org.apache.hadoop.fs.Path path,
        PType<T> ptype)Creates a  Source<T>instance from the Avro file(s) at the givenPath. | 
| static Source<org.apache.avro.generic.GenericData.Record> | avroFile(String pathName)Creates a  Source<GenericData.Record>by reading the schema of the Avro file
 at the given path. | 
| static <T extends org.apache.avro.specific.SpecificRecord> | avroFile(String pathName,
        Class<T> avroClass)Creates a  Source<T>instance from the Avro file(s) at the given path name. | 
| static <T> Source<T> | avroFile(String pathName,
        PType<T> ptype)Creates a  Source<T>instance from the Avro file(s) at the given path name. | 
| static <K,V> TableSource<K,V> | avroTableFile(List<org.apache.hadoop.fs.Path> paths,
             PTableType<K,V> tableType)Creates a  TableSource<K,V>for reading an Avro key/value file at the given paths. | 
| static <K,V> TableSource<K,V> | avroTableFile(org.apache.hadoop.fs.Path path,
             PTableType<K,V> tableType)Creates a  TableSource<K,V>for reading an Avro key/value file at the given path. | 
| static <K,V> TableSource<K,V> | formattedFile(List<org.apache.hadoop.fs.Path> paths,
             Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
             PType<K> keyType,
             PType<V> valueType)Creates a  TableSource<K, V>for reading data from files that have customFileInputFormatimplementations not covered by the providedTableSourceandSourcefactory methods. | 
| static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> | formattedFile(List<org.apache.hadoop.fs.Path> paths,
             Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
             Class<K> keyClass,
             Class<V> valueClass)Creates a  TableSource<K, V>for reading data from files that have customFileInputFormat<K, V>implementations not covered by the providedTableSourceandSourcefactory methods. | 
| static <K,V> TableSource<K,V> | formattedFile(org.apache.hadoop.fs.Path path,
             Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
             PType<K> keyType,
             PType<V> valueType)Creates a  TableSource<K, V>for reading data from files that have customFileInputFormatimplementations not covered by the providedTableSourceandSourcefactory methods. | 
| static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> | formattedFile(org.apache.hadoop.fs.Path path,
             Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
             Class<K> keyClass,
             Class<V> valueClass)Creates a  TableSource<K, V>for reading data from files that have customFileInputFormat<K, V>implementations not covered by the providedTableSourceandSourcefactory methods. | 
| static <K,V> TableSource<K,V> | formattedFile(String pathName,
             Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
             PType<K> keyType,
             PType<V> valueType)Creates a  TableSource<K, V>for reading data from files that have customFileInputFormatimplementations not covered by the providedTableSourceandSourcefactory methods. | 
| static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> | formattedFile(String pathName,
             Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
             Class<K> keyClass,
             Class<V> valueClass)Creates a  TableSource<K, V>for reading data from files that have customFileInputFormat<K, V>implementations not covered by the providedTableSourceandSourcefactory methods. | 
| static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> | sequenceFile(List<org.apache.hadoop.fs.Path> paths,
            Class<K> keyClass,
            Class<V> valueClass)Creates a  TableSource<K, V>instance for the SequenceFile(s) at the givenPaths. | 
| static <T extends org.apache.hadoop.io.Writable> | sequenceFile(List<org.apache.hadoop.fs.Path> paths,
            Class<T> valueClass)Creates a  Source<T>instance from the SequenceFile(s) at the givenPaths
 from the value field of each key-value pair in the SequenceFile(s). | 
| static <K,V> TableSource<K,V> | sequenceFile(List<org.apache.hadoop.fs.Path> paths,
            PType<K> keyType,
            PType<V> valueType)Creates a  TableSource<K, V>instance for the SequenceFile(s) at the givenPaths. | 
| static <T> Source<T> | sequenceFile(List<org.apache.hadoop.fs.Path> paths,
            PType<T> ptype)Creates a  Source<T>instance from the SequenceFile(s) at the givenPaths
 from the value field of each key-value pair in the SequenceFile(s). | 
| static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> | sequenceFile(org.apache.hadoop.fs.Path path,
            Class<K> keyClass,
            Class<V> valueClass)Creates a  TableSource<K, V>instance for the SequenceFile(s) at the givenPath. | 
| static <T extends org.apache.hadoop.io.Writable> | sequenceFile(org.apache.hadoop.fs.Path path,
            Class<T> valueClass)Creates a  Source<T>instance from the SequenceFile(s) at the givenPathfrom the value field of each key-value pair in the SequenceFile(s). | 
| static <K,V> TableSource<K,V> | sequenceFile(org.apache.hadoop.fs.Path path,
            PType<K> keyType,
            PType<V> valueType)Creates a  TableSource<K, V>instance for the SequenceFile(s) at the givenPath. | 
| static <T> Source<T> | sequenceFile(org.apache.hadoop.fs.Path path,
            PType<T> ptype)Creates a  Source<T>instance from the SequenceFile(s) at the givenPathfrom the value field of each key-value pair in the SequenceFile(s). | 
| static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> | sequenceFile(String pathName,
            Class<K> keyClass,
            Class<V> valueClass)Creates a  TableSource<K, V>instance for the SequenceFile(s) at the given path name. | 
| static <T extends org.apache.hadoop.io.Writable> | sequenceFile(String pathName,
            Class<T> valueClass)Creates a  Source<T>instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s). | 
| static <K,V> TableSource<K,V> | sequenceFile(String pathName,
            PType<K> keyType,
            PType<V> valueType)Creates a  TableSource<K, V>instance for the SequenceFile(s) at the given path name. | 
| static <T> Source<T> | sequenceFile(String pathName,
            PType<T> ptype)Creates a  Source<T>instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s). | 
| static Source<String> | textFile(List<org.apache.hadoop.fs.Path> paths)Creates a  Source<String>instance for the text file(s) at the givenPaths. | 
| static <T> Source<T> | textFile(List<org.apache.hadoop.fs.Path> paths,
        PType<T> ptype)Creates a  Source<T>instance for the text file(s) at the givenPaths using
 the providedPType<T>to convert the input text. | 
| static Source<String> | textFile(org.apache.hadoop.fs.Path path)Creates a  Source<String>instance for the text file(s) at the givenPath. | 
| static <T> Source<T> | textFile(org.apache.hadoop.fs.Path path,
        PType<T> ptype)Creates a  Source<T>instance for the text file(s) at the givenPathusing
 the providedPType<T>to convert the input text. | 
| static Source<String> | textFile(String pathName)Creates a  Source<String>instance for the text file(s) at the given path name. | 
| static <T> Source<T> | textFile(String pathName,
        PType<T> ptype)Creates a  Source<T>instance for the text file(s) at the given path name using
 the providedPType<T>to convert the input text. | 
public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V> for reading data from files that have custom
 FileInputFormat<K, V> implementations not covered by the provided TableSource
 and Source factory methods.pathName - The name of the path to the data on the filesystemformatClass - The FileInputFormat implementationkeyClass - The Writable to use for the keyvalueClass - The Writable to use for the valueTableSource<K, V> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V> for reading data from files that have custom
 FileInputFormat<K, V> implementations not covered by the provided TableSource
 and Source factory methods.path - The Path to the dataformatClass - The FileInputFormat implementationkeyClass - The Writable to use for the keyvalueClass - The Writable to use for the valueTableSource<K, V> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(List<org.apache.hadoop.fs.Path> paths, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V> for reading data from files that have custom
 FileInputFormat<K, V> implementations not covered by the provided TableSource
 and Source factory methods.paths - A list of Paths to the dataformatClass - The FileInputFormat implementationkeyClass - The Writable to use for the keyvalueClass - The Writable to use for the valueTableSource<K, V> instancepublic static <K,V> TableSource<K,V> formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
TableSource<K, V> for reading data from files that have custom
 FileInputFormat implementations not covered by the provided TableSource
 and Source factory methods.pathName - The name of the path to the data on the filesystemformatClass - The FileInputFormat implementationkeyType - The PType to use for the keyvalueType - The PType to use for the valueTableSource<K, V> instancepublic static <K,V> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
TableSource<K, V> for reading data from files that have custom
 FileInputFormat implementations not covered by the provided TableSource
 and Source factory methods.path - The Path to the dataformatClass - The FileInputFormat implementationkeyType - The PType to use for the keyvalueType - The PType to use for the valueTableSource<K, V> instancepublic static <K,V> TableSource<K,V> formattedFile(List<org.apache.hadoop.fs.Path> paths, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
TableSource<K, V> for reading data from files that have custom
 FileInputFormat implementations not covered by the provided TableSource
 and Source factory methods.paths - A list of Paths to the dataformatClass - The FileInputFormat implementationkeyType - The PType to use for the keyvalueType - The PType to use for the valueTableSource<K, V> instancepublic static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(String pathName, Class<T> avroClass)
Source<T> instance from the Avro file(s) at the given path name.pathName - The name of the path to the data on the filesystemavroClass - The subclass of SpecificRecord to use for the Avro fileSource<T> instancepublic static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
Source<T> instance from the Avro file(s) at the given Path.path - The Path to the dataavroClass - The subclass of SpecificRecord to use for the Avro fileSource<T> instancepublic static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(List<org.apache.hadoop.fs.Path> paths, Class<T> avroClass)
Source<T> instance from the Avro file(s) at the given Paths.paths - A list of Paths to the dataavroClass - The subclass of SpecificRecord to use for the Avro fileSource<T> instancepublic static <T> Source<T> avroFile(String pathName, PType<T> ptype)
Source<T> instance from the Avro file(s) at the given path name.pathName - The name of the path to the data on the filesystemptype - The AvroType for the Avro recordsSource<T> instancepublic static <T> Source<T> avroFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
Source<T> instance from the Avro file(s) at the given Path.path - The Path to the dataptype - The AvroType for the Avro recordsSource<T> instancepublic static <T> Source<T> avroFile(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype)
Source<T> instance from the Avro file(s) at the given Paths.paths - A list of Paths to the dataptype - The PType for the Avro recordsSource<T> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
Source<GenericData.Record> by reading the schema of the Avro file
 at the given path. If the path is a directory, the schema of a file in the directory
 will be used to determine the schema to use.pathName - The name of the path to the data on the filesystemSource<GenericData.Record> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
Source<GenericData.Record> by reading the schema of the Avro file
 at the given path. If the path is a directory, the schema of a file in the directory
 will be used to determine the schema to use.path - The path to the data on the filesystemSource<GenericData.Record> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths)
Source<GenericData.Record> by reading the schema of the Avro file
 at the given paths. If the path is a directory, the schema of a file in the directory
 will be used to determine the schema to use.paths - A list of paths to the data on the filesystemSource<GenericData.Record> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Source<GenericData.Record> by reading the schema of the Avro file
 at the given path using the FileSystem information contained in the given
 Configuration instance. If the path is a directory, the schema of a file in
 the directory will be used to determine the schema to use.path - The path to the data on the filesystemconf - The configuration informationSource<GenericData.Record> instancepublic static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths, org.apache.hadoop.conf.Configuration conf)
Source<GenericData.Record> by reading the schema of the Avro file
 at the given paths using the FileSystem information contained in the given
 Configuration instance. If the first path is a directory, the schema of a file in
 the directory will be used to determine the schema to use.paths - The path to the data on the filesystemconf - The configuration informationSource<GenericData.Record> instancepublic static <K,V> TableSource<K,V> avroTableFile(org.apache.hadoop.fs.Path path, PTableType<K,V> tableType)
TableSource<K,V> for reading an Avro key/value file at the given path.path - The path to the data on the filesystemtableType - Avro table type for deserializing the table dataTableSource<K,V> instance for reading Avro key/value datapublic static <K,V> TableSource<K,V> avroTableFile(List<org.apache.hadoop.fs.Path> paths, PTableType<K,V> tableType)
TableSource<K,V> for reading an Avro key/value file at the given paths.paths - list of paths to be read by the returned sourcetableType - Avro table type for deserializing the table dataTableSource<K,V> instance for reading Avro key/value datapublic static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(String pathName, Class<T> valueClass)
Source<T> instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s).pathName - The name of the path to the data on the filesystemvalueClass - The Writable type for the value of the SequenceFile entrySource<T> instancepublic static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
Source<T> instance from the SequenceFile(s) at the given Path
 from the value field of each key-value pair in the SequenceFile(s).path - The Path to the datavalueClass - The Writable type for the value of the SequenceFile entrySource<T> instancepublic static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(List<org.apache.hadoop.fs.Path> paths, Class<T> valueClass)
Source<T> instance from the SequenceFile(s) at the given Paths
 from the value field of each key-value pair in the SequenceFile(s).paths - A list of Paths to the datavalueClass - The Writable type for the value of the SequenceFile entrySource<T> instancepublic static <T> Source<T> sequenceFile(String pathName, PType<T> ptype)
Source<T> instance from the SequenceFile(s) at the given path name
 from the value field of each key-value pair in the SequenceFile(s).pathName - The name of the path to the data on the filesystemptype - The PType for the value of the SequenceFile entrySource<T> instancepublic static <T> Source<T> sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
Source<T> instance from the SequenceFile(s) at the given Path
 from the value field of each key-value pair in the SequenceFile(s).path - The Path to the dataptype - The PType for the value of the SequenceFile entrySource<T> instancepublic static <T> Source<T> sequenceFile(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype)
Source<T> instance from the SequenceFile(s) at the given Paths
 from the value field of each key-value pair in the SequenceFile(s).paths - A list of Paths to the dataptype - The PType for the value of the SequenceFile entrySource<T> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V> instance for the SequenceFile(s) at the given path name.pathName - The name of the path to the data on the filesystemkeyClass - The Writable subclass for the key of the SequenceFile entryvalueClass - The Writable subclass for the value of the SequenceFile entrySourceTable<K, V> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V> instance for the SequenceFile(s) at the given Path.path - The Path to the datakeyClass - The Writable subclass for the key of the SequenceFile entryvalueClass - The Writable subclass for the value of the SequenceFile entrySourceTable<K, V> instancepublic static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(List<org.apache.hadoop.fs.Path> paths, Class<K> keyClass, Class<V> valueClass)
TableSource<K, V> instance for the SequenceFile(s) at the given Paths.paths - A list of Paths to the datakeyClass - The Writable subclass for the key of the SequenceFile entryvalueClass - The Writable subclass for the value of the SequenceFile entrySourceTable<K, V> instancepublic static <K,V> TableSource<K,V> sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
TableSource<K, V> instance for the SequenceFile(s) at the given path name.pathName - The name of the path to the data on the filesystemkeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entrySourceTable<K, V> instancepublic static <K,V> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
TableSource<K, V> instance for the SequenceFile(s) at the given Path.path - The Path to the datakeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entrySourceTable<K, V> instancepublic static <K,V> TableSource<K,V> sequenceFile(List<org.apache.hadoop.fs.Path> paths, PType<K> keyType, PType<V> valueType)
TableSource<K, V> instance for the SequenceFile(s) at the given Paths.paths - A list of Paths to the datakeyType - The PType for the key of the SequenceFile entryvalueType - The PType for the value of the SequenceFile entrySourceTable<K, V> instancepublic static Source<String> textFile(String pathName)
Source<String> instance for the text file(s) at the given path name.pathName - The name of the path to the data on the filesystemSource<String> instancepublic static Source<String> textFile(org.apache.hadoop.fs.Path path)
Source<String> instance for the text file(s) at the given Path.path - The Path to the dataSource<String> instancepublic static Source<String> textFile(List<org.apache.hadoop.fs.Path> paths)
Source<String> instance for the text file(s) at the given Paths.paths - A list of Paths to the dataSource<String> instancepublic static <T> Source<T> textFile(String pathName, PType<T> ptype)
Source<T> instance for the text file(s) at the given path name using
 the provided PType<T> to convert the input text.pathName - The name of the path to the data on the filesystemptype - The PType<T> to use to process the input textSource<T> instancepublic static <T> Source<T> textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
Source<T> instance for the text file(s) at the given Path using
 the provided PType<T> to convert the input text.path - The Path to the dataptype - The PType<T> to use to process the input textSource<T> instancepublic static <T> Source<T> textFile(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype)
Source<T> instance for the text file(s) at the given Paths using
 the provided PType<T> to convert the input text.paths - A list of Paths to the dataptype - The PType<T> to use to process the input textSource<T> instanceCopyright © 2017 The Apache Software Foundation. All rights reserved.