Project Crunch has retired. For details please refer to its Attic page.
At (Apache Crunch 0.10.0 API)

org.apache.crunch.io
Class At

java.lang.Object
  extended by org.apache.crunch.io.At

public class At
extends Object

Static factory methods for creating common SourceTarget types, which may be treated as both a Source and a Target.

The At methods is analogous to the From and To factory methods, but is used for storing intermediate outputs that need to be passed from one run of a MapReduce pipeline to another run. The SourceTarget object acts as both a Source and a , which enables it to provide this functionality. Pipeline pipeline = new MRPipeline(this.getClass()); // Create our intermediate storage location SourceTarget intermediate = At.textFile("/temptext"); ... // Write out the output of the first phase of a pipeline. pipeline.write(phase1, intermediate); // Explicitly call run to kick off the pipeline. pipeline.run(); // And then kick off a second phase by consuming the output // from the first phase. PCollection phase2Input = pipeline.read(intermediate); ...

The SourceTarget abstraction is useful when we care about reading the intermediate outputs of a pipeline as well as the final results.


Constructor Summary
At()
           
 
Method Summary
static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
          Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path.
static
<T extends org.apache.avro.specific.SpecificRecord>
SourceTarget<T>
avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.
static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
          Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance.
static
<T> SourceTarget<T>
avroFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.
static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
          Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path.
static
<T extends org.apache.avro.specific.SpecificRecord>
SourceTarget<T>
avroFile(String pathName, Class<T> avroClass)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.
static
<T> SourceTarget<T>
avroFile(String pathName, PType<T> ptype)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSourceTarget<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).
static
<T extends org.apache.hadoop.io.Writable>
SourceTarget<T>
sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSourceTarget<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).
static
<T> SourceTarget<T>
sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSourceTarget<K,V>
sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).
static
<T extends org.apache.hadoop.io.Writable>
SourceTarget<T>
sequenceFile(String pathName, Class<T> valueClass)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSourceTarget<K,V>
sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).
static
<T> SourceTarget<T>
sequenceFile(String pathName, PType<T> ptype)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static SourceTarget<String> textFile(org.apache.hadoop.fs.Path path)
          Creates a SourceTarget<String> instance for the text file(s) at the given Path.
static
<T> SourceTarget<T>
textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a SourceTarget<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.
static SourceTarget<String> textFile(String pathName)
          Creates a SourceTarget<String> instance for the text file(s) at the given path name.
static
<T> SourceTarget<T>
textFile(String pathName, PType<T> ptype)
          Creates a SourceTarget<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

At

public At()
Method Detail

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> SourceTarget<T> avroFile(String pathName,
                                                                                           Class<T> avroClass)
Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new SourceTarget<T> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> SourceTarget<T> avroFile(org.apache.hadoop.fs.Path path,
                                                                                           Class<T> avroClass)
Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new SourceTarget<T> instance

avroFile

public static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new SourceTarget<GenericData.Record> instance

avroFile

public static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
path - The path to the data on the filesystem
Returns:
A new SourceTarget<GenericData.Record> instance

avroFile

public static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path,
                                                                                org.apache.hadoop.conf.Configuration conf)
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
path - The path to the data on the filesystem
conf - The configuration information
Returns:
A new SourceTarget<GenericData.Record> instance

avroFile

public static <T> SourceTarget<T> avroFile(String pathName,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType for the Avro records
Returns:
A new SourceTarget<T> instance

avroFile

public static <T> SourceTarget<T> avroFile(org.apache.hadoop.fs.Path path,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
ptype - The PType for the Avro records
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> SourceTarget<T> sequenceFile(String pathName,
                                                                                     Class<T> valueClass)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> SourceTarget<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                                     Class<T> valueClass)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T> SourceTarget<T> sequenceFile(String pathName,
                                               PType<T> ptype)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T> SourceTarget<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                               PType<T> ptype)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
ptype - The PType for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSourceTarget<K,V> sequenceFile(String pathName,
                                                                                                                                    Class<K> keyClass,
                                                                                                                                    Class<V> valueClass)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
keyClass - The Writable type for the key of the SequenceFile entry
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSourceTarget<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                                                                                    Class<K> keyClass,
                                                                                                                                    Class<V> valueClass)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).

Parameters:
path - The Path to the data
keyClass - The Writable type for the key of the SequenceFile entry
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

sequenceFile

public static <K,V> TableSourceTarget<K,V> sequenceFile(String pathName,
                                                        PType<K> keyType,
                                                        PType<V> valueType)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

sequenceFile

public static <K,V> TableSourceTarget<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                        PType<K> keyType,
                                                        PType<V> valueType)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).

Parameters:
path - The Path to the data
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

textFile

public static SourceTarget<String> textFile(String pathName)
Creates a SourceTarget<String> instance for the text file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new SourceTarget<String> instance

textFile

public static SourceTarget<String> textFile(org.apache.hadoop.fs.Path path)
Creates a SourceTarget<String> instance for the text file(s) at the given Path.

Parameters:
path - The Path to the data
Returns:
A new SourceTarget<String> instance

textFile

public static <T> SourceTarget<T> textFile(String pathName,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType<T> to use to process the input text
Returns:
A new SourceTarget<T> instance

textFile

public static <T> SourceTarget<T> textFile(org.apache.hadoop.fs.Path path,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.

Parameters:
path - The Path to the data
ptype - The PType<T> to use to process the input text
Returns:
A new SourceTarget<T> instance


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.