This project has retired. For details please refer to its Attic page.
FileSourceImpl (Apache Crunch 0.9.0 API)

org.apache.crunch.io.impl
Class FileSourceImpl<T>

java.lang.Object
  extended by org.apache.crunch.io.impl.FileSourceImpl<T>
All Implemented Interfaces:
Source<T>
Direct Known Subclasses:
AvroFileSource, AvroParquetFileSource, DataBaseSource, FileTableSourceImpl, HFileSource, NLineFileSource, SeqFileSource, TextFileSource, TrevniKeySource

public class FileSourceImpl<T>
extends Object
implements Source<T>


Field Summary
protected  FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> inputBundle
           
protected  org.apache.hadoop.fs.Path path
          Deprecated. 
protected  List<org.apache.hadoop.fs.Path> paths
           
protected  PType<T> ptype
           
 
Constructor Summary
FileSourceImpl(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)
           
FileSourceImpl(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype, FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> inputBundle)
           
FileSourceImpl(org.apache.hadoop.fs.Path path, PType<T> ptype, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)
           
FileSourceImpl(org.apache.hadoop.fs.Path path, PType<T> ptype, FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> inputBundle)
           
 
Method Summary
 void configureSource(org.apache.hadoop.mapreduce.Job job, int inputId)
          Configure the given job to use this source as an input.
 boolean equals(Object other)
           
 FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> getBundle()
           
 Converter<?,?,?,?> getConverter()
          Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.
 long getLastModifiedAt(org.apache.hadoop.conf.Configuration conf)
          Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)
 org.apache.hadoop.fs.Path getPath()
          Deprecated. 
 List<org.apache.hadoop.fs.Path> getPaths()
           
 long getSize(org.apache.hadoop.conf.Configuration configuration)
          Returns the number of bytes in this Source.
 PType<T> getType()
          Returns the PType for this source.
 int hashCode()
           
 Source<T> inputConf(String key, String value)
          Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>.
protected  String pathsAsString()
           
protected  Iterable<T> read(org.apache.hadoop.conf.Configuration conf, FileReaderFactory<T> readerFactory)
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

path

@Deprecated
protected final org.apache.hadoop.fs.Path path
Deprecated. 

paths

protected final List<org.apache.hadoop.fs.Path> paths

ptype

protected final PType<T> ptype

inputBundle

protected final FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> inputBundle
Constructor Detail

FileSourceImpl

public FileSourceImpl(org.apache.hadoop.fs.Path path,
                      PType<T> ptype,
                      Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)

FileSourceImpl

public FileSourceImpl(org.apache.hadoop.fs.Path path,
                      PType<T> ptype,
                      FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> inputBundle)

FileSourceImpl

public FileSourceImpl(List<org.apache.hadoop.fs.Path> paths,
                      PType<T> ptype,
                      Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)

FileSourceImpl

public FileSourceImpl(List<org.apache.hadoop.fs.Path> paths,
                      PType<T> ptype,
                      FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> inputBundle)
Method Detail

getPath

@Deprecated
public org.apache.hadoop.fs.Path getPath()
Deprecated. 


getPaths

public List<org.apache.hadoop.fs.Path> getPaths()

inputConf

public Source<T> inputConf(String key,
                           String value)
Description copied from interface: Source
Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>. Allows for multiple inputs to re-use the same config keys with different values when necessary.

Specified by:
inputConf in interface Source<T>

getConverter

public Converter<?,?,?,?> getConverter()
Description copied from interface: Source
Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.

Specified by:
getConverter in interface Source<T>

configureSource

public void configureSource(org.apache.hadoop.mapreduce.Job job,
                            int inputId)
                     throws IOException
Description copied from interface: Source
Configure the given job to use this source as an input.

Specified by:
configureSource in interface Source<T>
Parameters:
job - The job to configure
inputId - For a multi-input job, an identifier for this input to the job
Throws:
IOException

getBundle

public FormatBundle<? extends org.apache.hadoop.mapreduce.InputFormat> getBundle()

getType

public PType<T> getType()
Description copied from interface: Source
Returns the PType for this source.

Specified by:
getType in interface Source<T>

getSize

public long getSize(org.apache.hadoop.conf.Configuration configuration)
Description copied from interface: Source
Returns the number of bytes in this Source.

Specified by:
getSize in interface Source<T>

read

protected Iterable<T> read(org.apache.hadoop.conf.Configuration conf,
                           FileReaderFactory<T> readerFactory)
                    throws IOException
Throws:
IOException

pathsAsString

protected String pathsAsString()

getLastModifiedAt

public long getLastModifiedAt(org.apache.hadoop.conf.Configuration conf)
Description copied from interface: Source
Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)

Specified by:
getLastModifiedAt in interface Source<T>

equals

public boolean equals(Object other)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.