This project has retired. For details please refer to its Attic page.
From (Apache Crunch 0.7.0 API)

org.apache.crunch.io
Class From

java.lang.Object
  extended by org.apache.crunch.io.From

public class From
extends Object

Static factory methods for creating common Source types.

The From class is intended to provide a literate API for creating Crunch pipelines from common input file types. Pipeline pipeline = new MRPipeline(this.getClass()); // Reference the lines of a text file by wrapping the TextInputFormat class. PCollection lines = pipeline.read(From.textFile("/path/to/myfiles")); // Reference entries from a sequence file where the key is a LongWritable and the // value is a custom Writable class. PTable table = pipeline.read(From.sequenceFile( "/path/to/seqfiles", LongWritable.class, MyWritable.class)); // Reference the records from an Avro file, where MyAvroObject implements Avro's // SpecificRecord interface. PCollection myObjects = pipeline.read(From.avroFile("/path/to/avrofiles", MyAvroObject.class)); // References the key-value pairs from a custom extension of FileInputFormat: PTable custom = pipeline.read(From.formattedFile( "/custom", MyFileInputFormat.class, KeyWritable.class, ValueWritable.class));


Constructor Summary
From()
           
 
Method Summary
static
<T> Source<T>
avroFile(org.apache.hadoop.fs.Path path, AvroType<T> avroType)
          Creates a Source<T> instance from the Avro file(s) at the given Path.
static
<T extends org.apache.avro.specific.SpecificRecord>
Source<T>
avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
          Creates a Source<T> instance from the Avro file(s) at the given Path.
static
<T> Source<T>
avroFile(String pathName, AvroType<T> avroType)
          Creates a Source<T> instance from the Avro file(s) at the given path name.
static
<T extends org.apache.avro.specific.SpecificRecord>
Source<T>
avroFile(String pathName, Class<T> avroClass)
          Creates a Source<T> instance from the Avro file(s) at the given path name.
static
<K,V> TableSource<K,V>
formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
static
<K,V> TableSource<K,V>
formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.
static
<T extends org.apache.hadoop.io.Writable>
Source<T>
sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
          Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSource<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.
static
<T> Source<T>
sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.
static
<T extends org.apache.hadoop.io.Writable>
Source<T>
sequenceFile(String pathName, Class<T> valueClass)
          Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSource<K,V>
sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.
static
<T> Source<T>
sequenceFile(String pathName, PType<T> ptype)
          Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static Source<String> textFile(org.apache.hadoop.fs.Path path)
          Creates a Source<String> instance for the text file(s) at the given Path.
static
<T> Source<T>
textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a Source<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.
static Source<String> textFile(String pathName)
          Creates a Source<String> instance for the text file(s) at the given path name.
static
<T> Source<T>
textFile(String pathName, PType<T> ptype)
          Creates a Source<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

From

public From()
Method Detail

formattedFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(String pathName,
                                                                                                                               Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
                                                                                                                               Class<K> keyClass,
                                                                                                                               Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.

Parameters:
pathName - The name of the path to the data on the filesystem
formatClass - The FileInputFormat implementation
keyClass - The Writable to use for the key
valueClass - The Writable to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path,
                                                                                                                               Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
                                                                                                                               Class<K> keyClass,
                                                                                                                               Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.

Parameters:
The - Path to the data
formatClass - The FileInputFormat implementation
keyClass - The Writable to use for the key
valueClass - The Writable to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K,V> TableSource<K,V> formattedFile(String pathName,
                                                   Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
                                                   PType<K> keyType,
                                                   PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.

Parameters:
pathName - The name of the path to the data on the filesystem
formatClass - The FileInputFormat implementation
keyType - The PType to use for the key
valueType - The PType to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K,V> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path,
                                                   Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
                                                   PType<K> keyType,
                                                   PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.

Parameters:
The - Path to the data
formatClass - The FileInputFormat implementation
keyType - The PType to use for the key
valueType - The PType to use for the value
Returns:
A new TableSource<K, V> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(String pathName,
                                                                                     Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new Source<T> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(org.apache.hadoop.fs.Path path,
                                                                                     Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new Source<T> instance

avroFile

public static <T> Source<T> avroFile(String pathName,
                                     AvroType<T> avroType)
Creates a Source<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
avroType - The AvroType for the Avro records
Returns:
A new Source<T> instance

avroFile

public static <T> Source<T> avroFile(org.apache.hadoop.fs.Path path,
                                     AvroType<T> avroType)
Creates a Source<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
avroType - The AvroType for the Avro records
Returns:
A new Source<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(String pathName,
                                                                               Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                               Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T> Source<T> sequenceFile(String pathName,
                                         PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T> Source<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                         PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
ptype - The PType for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(String pathName,
                                                                                                                              Class<K> keyClass,
                                                                                                                              Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
keyClass - The Writable subclass for the key of the SequenceFile entry
valueClass - The Writable subclass for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                                                                              Class<K> keyClass,
                                                                                                                              Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.

Parameters:
path - The Path to the data
keyClass - The Writable subclass for the key of the SequenceFile entry
valueClass - The Writable subclass for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K,V> TableSource<K,V> sequenceFile(String pathName,
                                                  PType<K> keyType,
                                                  PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K,V> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                  PType<K> keyType,
                                                  PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.

Parameters:
path - The Path to the data
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

textFile

public static Source<String> textFile(String pathName)
Creates a Source<String> instance for the text file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new Source<String> instance

textFile

public static Source<String> textFile(org.apache.hadoop.fs.Path path)
Creates a Source<String> instance for the text file(s) at the given Path.

Parameters:
path - The Path to the data
Returns:
A new Source<String> instance

textFile

public static <T> Source<T> textFile(String pathName,
                                     PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType<T> to use to process the input text
Returns:
A new Source<T> instance

textFile

public static <T> Source<T> textFile(org.apache.hadoop.fs.Path path,
                                     PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.

Parameters:
path - The Path to the data
ptype - The PType<T> to use to process the input text
Returns:
A new Source<T> instance


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.