This project has retired. For details please refer to its Attic page.
NLineFileSource (Apache Crunch 0.9.0 API)

org.apache.crunch.io.text
Class NLineFileSource<T>

java.lang.Object
  extended by org.apache.crunch.io.impl.FileSourceImpl<T>
      extended by org.apache.crunch.io.text.NLineFileSource<T>
All Implemented Interfaces:
ReadableSource<T>, Source<T>

public class NLineFileSource<T>
extends FileSourceImpl<T>
implements ReadableSource<T>

A Source instance that uses the NLineInputFormat, which gives each map task a fraction of the lines in a text file as input. Most useful when running simulations on Hadoop, where each line represents configuration information about each simulation run.


Field Summary
 
Fields inherited from class org.apache.crunch.io.impl.FileSourceImpl
inputBundle, path, paths, ptype
 
Constructor Summary
NLineFileSource(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype, int linesPerTask)
          Create a new NLineFileSource instance.
NLineFileSource(org.apache.hadoop.fs.Path path, PType<T> ptype, int linesPerTask)
          Create a new NLineFileSource instance.
NLineFileSource(String path, PType<T> ptype, int linesPerTask)
          Create a new NLineFileSource instance.
 
Method Summary
 ReadableData<T> asReadable()
           
 Iterable<T> read(org.apache.hadoop.conf.Configuration conf)
          Returns an Iterable that contains the contents of this source.
 String toString()
           
 
Methods inherited from class org.apache.crunch.io.impl.FileSourceImpl
configureSource, equals, getBundle, getConverter, getLastModifiedAt, getPath, getPaths, getSize, getType, hashCode, inputConf, pathsAsString, read
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.crunch.Source
configureSource, getConverter, getLastModifiedAt, getSize, getType, inputConf
 

Constructor Detail

NLineFileSource

public NLineFileSource(String path,
                       PType<T> ptype,
                       int linesPerTask)
Create a new NLineFileSource instance.

Parameters:
path - The path to the input data, as a String
ptype - The PType to use for processing the data
linesPerTask - The number of lines from the input each map task will process

NLineFileSource

public NLineFileSource(org.apache.hadoop.fs.Path path,
                       PType<T> ptype,
                       int linesPerTask)
Create a new NLineFileSource instance.

Parameters:
path - The Path to the input data
ptype - The PType to use for processing the data
linesPerTask - The number of lines from the input each map task will process

NLineFileSource

public NLineFileSource(List<org.apache.hadoop.fs.Path> paths,
                       PType<T> ptype,
                       int linesPerTask)
Create a new NLineFileSource instance.

Parameters:
paths - The Paths to the input data
ptype - The PType to use for processing the data
linesPerTask - The number of lines from the input each map task will process
Method Detail

toString

public String toString()
Overrides:
toString in class FileSourceImpl<T>

read

public Iterable<T> read(org.apache.hadoop.conf.Configuration conf)
                 throws IOException
Description copied from interface: ReadableSource
Returns an Iterable that contains the contents of this source.

Specified by:
read in interface ReadableSource<T>
Parameters:
conf - The current Configuration instance
Returns:
the contents of this Source as an Iterable instance
Throws:
IOException

asReadable

public ReadableData<T> asReadable()
Specified by:
asReadable in interface ReadableSource<T>
Returns:
a ReadableData instance containing the data referenced by this ReadableSource.


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.