public class KafkaSource extends Object implements TableSource<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>, ReadableSource<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
The values retrieved from Kafka are returned as raw bytes inside of a BytesWritable
. If callers
need specific parsing logic based on the topic then consumers are encouraged to use multiple Kafka Sources
for each topic and use special DoFn
to parse the payload.
Modifier and Type | Class and Description |
---|---|
static class |
KafkaSource.BytesDeserializer
Basic
Deserializer which simply wraps the payload as a BytesWritable . |
Modifier and Type | Field and Description |
---|---|
static long |
CONSUMER_POLL_TIMEOUT_DEFAULT
Default timeout value for
CONSUMER_POLL_TIMEOUT_KEY of 1 second. |
static String |
CONSUMER_POLL_TIMEOUT_KEY
Constant to indicate how long the reader waits before timing out when retrieving data from Kafka.
|
Constructor and Description |
---|
KafkaSource(Properties kafkaConnectionProperties,
Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets)
Constructs a Kafka source that will read data from the Kafka cluster identified by the
kafkaConnectionProperties
and from the specific topics and partitions identified in the offsets |
Modifier and Type | Method and Description |
---|---|
ReadableData<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> |
asReadable() |
void |
configureSource(org.apache.hadoop.mapreduce.Job job,
int inputId)
Configure the given job to use this source as an input.
|
Converter<?,?,?,?> |
getConverter()
Returns the
Converter used for mapping the inputs from this instance
into PCollection or PTable values. |
long |
getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Returns the time (in milliseconds) that this
Source was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.) |
long |
getSize(org.apache.hadoop.conf.Configuration configuration)
Returns the number of bytes in this
Source . |
PTableType<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> |
getTableType() |
PType<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> |
getType()
Returns the
PType for this source. |
Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> |
inputConf(String key,
String value)
Adds the given key-value pair to the
Configuration instance that is used to read
this Source<T></T> . |
Iterable<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> |
read(org.apache.hadoop.conf.Configuration conf)
Returns an
Iterable that contains the contents of this source. |
String |
toString() |
public static final String CONSUMER_POLL_TIMEOUT_KEY
public static final long CONSUMER_POLL_TIMEOUT_DEFAULT
CONSUMER_POLL_TIMEOUT_KEY
of 1 second.public KafkaSource(Properties kafkaConnectionProperties, Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets)
kafkaConnectionProperties
and from the specific topics and partitions identified in the offsets
kafkaConnectionProperties
- The connection properties for reading from Kafka. These properties will be honored
with the exception of the ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG
and
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG
offsets
- A map of TopicPartition
to a pair of start and end offsets respectively. The start and end offsets
are evaluated at [start, end) where the ending offset is excluded. Each TopicPartition must have a
non-null pair describing its offsets. The start offset should be less than the end offset. If the values
are equal or start is greater than the end then that partition will be skipped.public Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> inputConf(String key, String value)
Source
Configuration
instance that is used to read
this Source<T></T>
. Allows for multiple inputs to re-use the same config keys with
different values when necessary.public PType<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> getType()
Source
PType
for this source.public Converter<?,?,?,?> getConverter()
Source
Converter
used for mapping the inputs from this instance
into PCollection
or PTable
values.getConverter
in interface Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
public PTableType<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> getTableType()
getTableType
in interface TableSource<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
public long getSize(org.apache.hadoop.conf.Configuration configuration)
Source
Source
.public long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
Source
Source
was most recently
modified (e.g., because an input file was edited or new files were added to
a directory.)getLastModifiedAt
in interface Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
public Iterable<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> read(org.apache.hadoop.conf.Configuration conf) throws IOException
ReadableSource
Iterable
that contains the contents of this source.read
in interface ReadableSource<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
conf
- The current Configuration
instanceSource
as an Iterable
instanceIOException
public void configureSource(org.apache.hadoop.mapreduce.Job job, int inputId) throws IOException
Source
configureSource
in interface Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
job
- The job to configureinputId
- For a multi-input job, an identifier for this input to the jobIOException
public ReadableData<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> asReadable()
asReadable
in interface ReadableSource<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
ReadableData
instance containing the data referenced by this
ReadableSource
.Copyright © 2017 The Apache Software Foundation. All rights reserved.