|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.crunch.io.impl.SourcePathTargetImpl<T> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl<Pair<K,V>> org.apache.crunch.io.text.TextFileTableSourceTarget<K,V>
public class TextFileTableSourceTarget<K,V>
A TableSource
and SourceTarget
implementation that uses the
KeyValueTextInputFormat
and TextOutputFormat
to support reading
and writing text files as PTable
instances using a tab separator for
the keys and the values.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.apache.crunch.Target |
---|
Target.WriteMode |
Field Summary | |
---|---|
protected Source<T> |
source
|
protected Target |
target
|
Constructor Summary | |
---|---|
TextFileTableSourceTarget(org.apache.hadoop.fs.Path path,
PTableType<K,V> tableType)
|
|
TextFileTableSourceTarget(org.apache.hadoop.fs.Path path,
PTableType<K,V> tableType,
FileNamingScheme fileNamingScheme)
|
|
TextFileTableSourceTarget(String path,
PTableType<K,V> tableType)
|
Method Summary | ||
---|---|---|
boolean |
accept(OutputHandler handler,
PType<?> ptype)
Checks to see if this Target instance is compatible with the
given PType . |
|
|
asSourceTarget(PType<S> ptype)
Attempt to create the SourceTarget type that corresponds to this Target
for the given PType , if possible. |
|
SourceTarget<T> |
conf(String key,
String value)
Adds the given key-value pair to the Configuration instance(s) that are used to
read and write this SourceTarget<T> . |
|
void |
configureSource(org.apache.hadoop.mapreduce.Job job,
int inputId)
Configure the given job to use this source as an input. |
|
boolean |
equals(Object other)
|
|
Converter<?,?,?,?> |
getConverter()
Returns the Converter used for mapping the inputs from this instance
into PCollection or PTable values. |
|
Converter<?,?,?,?> |
getConverter(PType<?> ptype)
Returns the Converter to use for mapping from the output PCollection
into the output values expected by this instance. |
|
long |
getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Returns the time (in milliseconds) that this Source was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.) |
|
long |
getSize(org.apache.hadoop.conf.Configuration configuration)
Returns the number of bytes in this Source . |
|
PTableType<K,V> |
getTableType()
|
|
PType<T> |
getType()
Returns the PType for this source. |
|
boolean |
handleExisting(Target.WriteMode strategy,
long lastModifiedAt,
org.apache.hadoop.conf.Configuration conf)
Apply the given WriteMode to this Target instance. |
|
int |
hashCode()
|
|
Source<T> |
inputConf(String key,
String value)
Adds the given key-value pair to the Configuration instance that is used to read
this Source<T></T> . |
|
Target |
outputConf(String key,
String value)
Adds the given key-value pair to the Configuration instance that is used to write
this Target . |
|
String |
toString()
|
Methods inherited from class org.apache.crunch.io.impl.ReadableSourcePathTargetImpl |
---|
asReadable, read |
Methods inherited from class org.apache.crunch.io.impl.SourcePathTargetImpl |
---|
configureForMapReduce, getFileNamingScheme, getPath, handleOutputs |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface org.apache.crunch.SourceTarget |
---|
conf |
Methods inherited from interface org.apache.crunch.Source |
---|
configureSource, getConverter, getLastModifiedAt, getSize, getType, inputConf |
Methods inherited from interface org.apache.crunch.Target |
---|
accept, asSourceTarget, getConverter, handleExisting, outputConf |
Field Detail |
---|
protected final Source<T> source
protected final Target target
Constructor Detail |
---|
public TextFileTableSourceTarget(String path, PTableType<K,V> tableType)
public TextFileTableSourceTarget(org.apache.hadoop.fs.Path path, PTableType<K,V> tableType)
public TextFileTableSourceTarget(org.apache.hadoop.fs.Path path, PTableType<K,V> tableType, FileNamingScheme fileNamingScheme)
Method Detail |
---|
public PTableType<K,V> getTableType()
getTableType
in interface TableSource<K,V>
public String toString()
public Source<T> inputConf(String key, String value)
Source
Configuration
instance that is used to read
this Source<T></T>
. Allows for multiple inputs to re-use the same config keys with
different values when necessary.
inputConf
in interface Source<T>
public PType<T> getType()
Source
PType
for this source.
getType
in interface Source<T>
public void configureSource(org.apache.hadoop.mapreduce.Job job, int inputId) throws IOException
Source
configureSource
in interface Source<T>
job
- The job to configureinputId
- For a multi-input job, an identifier for this input to the job
IOException
public long getSize(org.apache.hadoop.conf.Configuration configuration)
Source
Source
.
getSize
in interface Source<T>
public boolean accept(OutputHandler handler, PType<?> ptype)
Target
Target
instance is compatible with the
given PType
.
accept
in interface Target
handler
- The OutputHandler
that is managing the output for the jobptype
- The PType
to check
PType
,
false otherwisepublic <S> SourceTarget<S> asSourceTarget(PType<S> ptype)
Target
SourceTarget
type that corresponds to this Target
for the given PType
, if possible. If it is not possible, return null
.
asSourceTarget
in interface Target
ptype
- The PType
to use in constructing the SourceTarget
SourceTarget
or null if such a SourceTarget
does not existpublic boolean equals(Object other)
equals
in class Object
public int hashCode()
hashCode
in class Object
public Target outputConf(String key, String value)
Target
Configuration
instance that is used to write
this Target
. Allows for multiple target outputs to re-use the same config keys with
different values when necessary.
outputConf
in interface Target
public boolean handleExisting(Target.WriteMode strategy, long lastModifiedAt, org.apache.hadoop.conf.Configuration conf)
Target
WriteMode
to this Target
instance.
handleExisting
in interface Target
strategy
- The strategy for handling existing outputsconf
- The ever-useful Configuration
instance
public long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Source
Source
was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.)
getLastModifiedAt
in interface Source<T>
public Converter<?,?,?,?> getConverter()
Source
Converter
used for mapping the inputs from this instance
into PCollection
or PTable
values.
getConverter
in interface Source<T>
public Converter<?,?,?,?> getConverter(PType<?> ptype)
Target
Converter
to use for mapping from the output PCollection
into the output values expected by this instance.
getConverter
in interface Target
ptype
- The PType
of the data that is being written to this instance
Converter
for the output represented by this instancepublic SourceTarget<T> conf(String key, String value)
SourceTarget
Configuration
instance(s) that are used to
read and write this SourceTarget<T>
. Allows for multiple inputs and outputs to
re-use the same config keys with different values when necessary.
conf
in interface SourceTarget<T>
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |