This project has retired. For details please refer to its Attic page.
TextFileTableSourceTarget (Apache Crunch 0.9.0 API)

org.apache.crunch.io.text
Class TextFileTableSourceTarget<K,V>

java.lang.Object
  extended by org.apache.crunch.io.impl.SourcePathTargetImpl<T>
      extended by org.apache.crunch.io.impl.ReadableSourcePathTargetImpl<Pair<K,V>>
          extended by org.apache.crunch.io.text.TextFileTableSourceTarget<K,V>
All Implemented Interfaces:
MapReduceTarget, PathTarget, ReadableSource<Pair<K,V>>, ReadableSourceTarget<Pair<K,V>>, Source<Pair<K,V>>, SourceTarget<Pair<K,V>>, TableSource<K,V>, TableSourceTarget<K,V>, Target

public class TextFileTableSourceTarget<K,V>
extends ReadableSourcePathTargetImpl<Pair<K,V>>
implements TableSourceTarget<K,V>

A TableSource and SourceTarget implementation that uses the KeyValueTextInputFormat and TextOutputFormat to support reading and writing text files as PTable instances using a tab separator for the keys and the values.


Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.crunch.Target
Target.WriteMode
 
Field Summary
protected  Source<T> source
           
protected  Target target
           
 
Constructor Summary
TextFileTableSourceTarget(org.apache.hadoop.fs.Path path, PTableType<K,V> tableType)
           
TextFileTableSourceTarget(org.apache.hadoop.fs.Path path, PTableType<K,V> tableType, FileNamingScheme fileNamingScheme)
           
TextFileTableSourceTarget(String path, PTableType<K,V> tableType)
           
 
Method Summary
 boolean accept(OutputHandler handler, PType<?> ptype)
          Checks to see if this Target instance is compatible with the given PType.
<S> SourceTarget<S>
asSourceTarget(PType<S> ptype)
          Attempt to create the SourceTarget type that corresponds to this Target for the given PType, if possible.
 SourceTarget<T> conf(String key, String value)
          Adds the given key-value pair to the Configuration instance(s) that are used to read and write this SourceTarget<T>.
 void configureSource(org.apache.hadoop.mapreduce.Job job, int inputId)
          Configure the given job to use this source as an input.
 boolean equals(Object other)
           
 Converter<?,?,?,?> getConverter()
          Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.
 Converter<?,?,?,?> getConverter(PType<?> ptype)
          Returns the Converter to use for mapping from the output PCollection into the output values expected by this instance.
 long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
          Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)
 long getSize(org.apache.hadoop.conf.Configuration configuration)
          Returns the number of bytes in this Source.
 PTableType<K,V> getTableType()
           
 PType<T> getType()
          Returns the PType for this source.
 boolean handleExisting(Target.WriteMode strategy, long lastModifiedAt, org.apache.hadoop.conf.Configuration conf)
          Apply the given WriteMode to this Target instance.
 int hashCode()
           
 Source<T> inputConf(String key, String value)
          Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>.
 Target outputConf(String key, String value)
          Adds the given key-value pair to the Configuration instance that is used to write this Target.
 String toString()
           
 
Methods inherited from class org.apache.crunch.io.impl.ReadableSourcePathTargetImpl
asReadable, read
 
Methods inherited from class org.apache.crunch.io.impl.SourcePathTargetImpl
configureForMapReduce, getFileNamingScheme, getPath, handleOutputs
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.crunch.SourceTarget
conf
 
Methods inherited from interface org.apache.crunch.Source
configureSource, getConverter, getLastModifiedAt, getSize, getType, inputConf
 
Methods inherited from interface org.apache.crunch.Target
accept, asSourceTarget, getConverter, handleExisting, outputConf
 

Field Detail

source

protected final Source<T> source

target

protected final Target target
Constructor Detail

TextFileTableSourceTarget

public TextFileTableSourceTarget(String path,
                                 PTableType<K,V> tableType)

TextFileTableSourceTarget

public TextFileTableSourceTarget(org.apache.hadoop.fs.Path path,
                                 PTableType<K,V> tableType)

TextFileTableSourceTarget

public TextFileTableSourceTarget(org.apache.hadoop.fs.Path path,
                                 PTableType<K,V> tableType,
                                 FileNamingScheme fileNamingScheme)
Method Detail

getTableType

public PTableType<K,V> getTableType()
Specified by:
getTableType in interface TableSource<K,V>

toString

public String toString()

inputConf

public Source<T> inputConf(String key,
                           String value)
Description copied from interface: Source
Adds the given key-value pair to the Configuration instance that is used to read this Source<T></T>. Allows for multiple inputs to re-use the same config keys with different values when necessary.

Specified by:
inputConf in interface Source<T>

getType

public PType<T> getType()
Description copied from interface: Source
Returns the PType for this source.

Specified by:
getType in interface Source<T>

configureSource

public void configureSource(org.apache.hadoop.mapreduce.Job job,
                            int inputId)
                     throws IOException
Description copied from interface: Source
Configure the given job to use this source as an input.

Specified by:
configureSource in interface Source<T>
Parameters:
job - The job to configure
inputId - For a multi-input job, an identifier for this input to the job
Throws:
IOException

getSize

public long getSize(org.apache.hadoop.conf.Configuration configuration)
Description copied from interface: Source
Returns the number of bytes in this Source.

Specified by:
getSize in interface Source<T>

accept

public boolean accept(OutputHandler handler,
                      PType<?> ptype)
Description copied from interface: Target
Checks to see if this Target instance is compatible with the given PType.

Specified by:
accept in interface Target
Parameters:
handler - The OutputHandler that is managing the output for the job
ptype - The PType to check
Returns:
True if this Target can write data in the form of the given PType, false otherwise

asSourceTarget

public <S> SourceTarget<S> asSourceTarget(PType<S> ptype)
Description copied from interface: Target
Attempt to create the SourceTarget type that corresponds to this Target for the given PType, if possible. If it is not possible, return null.

Specified by:
asSourceTarget in interface Target
Parameters:
ptype - The PType to use in constructing the SourceTarget
Returns:
A new SourceTarget or null if such a SourceTarget does not exist

equals

public boolean equals(Object other)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

outputConf

public Target outputConf(String key,
                         String value)
Description copied from interface: Target
Adds the given key-value pair to the Configuration instance that is used to write this Target. Allows for multiple target outputs to re-use the same config keys with different values when necessary.

Specified by:
outputConf in interface Target

handleExisting

public boolean handleExisting(Target.WriteMode strategy,
                              long lastModifiedAt,
                              org.apache.hadoop.conf.Configuration conf)
Description copied from interface: Target
Apply the given WriteMode to this Target instance.

Specified by:
handleExisting in interface Target
Parameters:
strategy - The strategy for handling existing outputs
conf - The ever-useful Configuration instance
Returns:
true if the target did exist

getLastModifiedAt

public long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Description copied from interface: Source
Returns the time (in milliseconds) that this Source was most recently modified (e.g., because an input file was edited or new files were added to a directory.)

Specified by:
getLastModifiedAt in interface Source<T>

getConverter

public Converter<?,?,?,?> getConverter()
Description copied from interface: Source
Returns the Converter used for mapping the inputs from this instance into PCollection or PTable values.

Specified by:
getConverter in interface Source<T>

getConverter

public Converter<?,?,?,?> getConverter(PType<?> ptype)
Description copied from interface: Target
Returns the Converter to use for mapping from the output PCollection into the output values expected by this instance.

Specified by:
getConverter in interface Target
Parameters:
ptype - The PType of the data that is being written to this instance
Returns:
A valid Converter for the output represented by this instance

conf

public SourceTarget<T> conf(String key,
                            String value)
Description copied from interface: SourceTarget
Adds the given key-value pair to the Configuration instance(s) that are used to read and write this SourceTarget<T>. Allows for multiple inputs and outputs to re-use the same config keys with different values when necessary.

Specified by:
conf in interface SourceTarget<T>


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.