public interface Source<T>
Source
represents an input data set that is an input to one or more
MapReduce jobs.Modifier and Type | Method and Description |
---|---|
void |
configureSource(org.apache.hadoop.mapreduce.Job job,
int inputId)
Configure the given job to use this source as an input.
|
Converter<?,?,?,?> |
getConverter()
Returns the
Converter used for mapping the inputs from this instance
into PCollection or PTable values. |
long |
getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Returns the time (in milliseconds) that this
Source was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.) |
long |
getSize(org.apache.hadoop.conf.Configuration configuration)
Returns the number of bytes in this
Source . |
PType<T> |
getType()
Returns the
PType for this source. |
Source<T> |
inputConf(String key,
String value)
Adds the given key-value pair to the
Configuration instance that is used to read
this Source<T></T> . |
Source<T> inputConf(String key, String value)
Configuration
instance that is used to read
this Source<T></T>
. Allows for multiple inputs to re-use the same config keys with
different values when necessary.Converter<?,?,?,?> getConverter()
Converter
used for mapping the inputs from this instance
into PCollection
or PTable
values.void configureSource(org.apache.hadoop.mapreduce.Job job, int inputId) throws IOException
job
- The job to configureinputId
- For a multi-input job, an identifier for this input to the jobIOException
long getSize(org.apache.hadoop.conf.Configuration configuration)
Source
.long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Source
was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.)Copyright © 2016 The Apache Software Foundation. All rights reserved.