This project has retired. For details please refer to its Attic page.
Source (Apache Crunch 0.10.0 API)

org.apache.crunch
Interface Source<T>

All Known Subinterfaces:
ReadableSource<T>, ReadableSourceTarget<T>, SourceTarget<T>, TableSource<K,V>, TableSourceTarget<K,V>
All Known Implementing Classes:
DataBaseSource, org.apache.crunch.io.impl.FileSourceImpl

public interface Source<T>

A Source represents an input data set that is an input to one or more MapReduce jobs.


Method Summary
 void configureSource(org.apache.hadoop.mapreduce.Job job, int inputId)
          Configure the given job to use this source as an input.
 Converter<?,?,?,?> getConverter()
          Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.
 long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
          Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)
 long getSize(org.apache.hadoop.conf.Configuration configuration)
          Returns the number of bytes in this Source.
 PType<T> getType()
          Returns the PType for this source.
 Source<T> inputConf(String key, String value)
          Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>.
 

Method Detail

inputConf

Source<T> inputConf(String key,
                    String value)
Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>. Allows for multiple inputs to re-use the same config keys with different values when necessary.


getType

PType<T> getType()
Returns the PType for this source.


getConverter

Converter<?,?,?,?> getConverter()
Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.


configureSource

void configureSource(org.apache.hadoop.mapreduce.Job job,
                     int inputId)
                     throws IOException
Configure the given job to use this source as an input.

Parameters:
job - The job to configure
inputId - For a multi-input job, an identifier for this input to the job
Throws:
IOException

getSize

long getSize(org.apache.hadoop.conf.Configuration configuration)
Returns the number of bytes in this Source.


getLastModifiedAt

long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.