public class KafkaSource extends Object implements TableSource<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>, ReadableSource<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>
 The values retrieved from Kafka are returned as raw bytes inside of a BytesWritable.  If callers
 need specific parsing logic based on the topic then consumers are encouraged to use multiple Kafka Sources
 for each topic and use special DoFn to parse the payload.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | KafkaSource.BytesDeserializerBasic  Deserializerwhich simply wraps the payload as aBytesWritable. | 
| Modifier and Type | Field and Description | 
|---|---|
| static long | CONSUMER_POLL_TIMEOUT_DEFAULTDefault timeout value for  CONSUMER_POLL_TIMEOUT_KEYof 1 second. | 
| static String | CONSUMER_POLL_TIMEOUT_KEYConstant to indicate how long the reader waits before timing out when retrieving data from Kafka. | 
| Constructor and Description | 
|---|
| KafkaSource(Properties kafkaConnectionProperties,
           Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets)Constructs a Kafka source that will read data from the Kafka cluster identified by the  kafkaConnectionPropertiesand from the specific topics and partitions identified in theoffsets | 
| Modifier and Type | Method and Description | 
|---|---|
| ReadableData<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | asReadable() | 
| void | configureSource(org.apache.hadoop.mapreduce.Job job,
               int inputId)Configure the given job to use this source as an input. | 
| Converter<?,?,?,?> | getConverter()Returns the  Converterused for mapping the inputs from this instance
 intoPCollectionorPTablevalues. | 
| long | getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)Returns the time (in milliseconds) that this  Sourcewas most recently
 modified (e.g., because an input file was edited or new files were added to
 a directory.) | 
| long | getSize(org.apache.hadoop.conf.Configuration configuration)Returns the number of bytes in this  Source. | 
| PTableType<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> | getTableType() | 
| PType<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | getType()Returns the  PTypefor this source. | 
| Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | inputConf(String key,
         String value)Adds the given key-value pair to the  Configurationinstance that is used to read
 thisSource<T></T>. | 
| Iterable<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> | read(org.apache.hadoop.conf.Configuration conf)Returns an  Iterablethat contains the contents of this source. | 
| String | toString() | 
public static final String CONSUMER_POLL_TIMEOUT_KEY
public static final long CONSUMER_POLL_TIMEOUT_DEFAULT
CONSUMER_POLL_TIMEOUT_KEY of 1 second.public KafkaSource(Properties kafkaConnectionProperties, Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets)
kafkaConnectionProperties
 and from the specific topics and partitions identified in the offsetskafkaConnectionProperties - The connection properties for reading from Kafka.  These properties will be honored
                                  with the exception of the ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG and
                                  ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIGoffsets - A map of TopicPartition to a pair of start and end offsets respectively.  The start and end offsets
                are evaluated at [start, end) where the ending offset is excluded.  Each TopicPartition must have a
                non-null pair describing its offsets.  The start offset should be less than the end offset.  If the values
                are equal or start is greater than the end then that partition will be skipped.public Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> inputConf(String key, String value)
SourceConfiguration instance that is used to read
 this Source<T></T>. Allows for multiple inputs to re-use the same config keys with
 different values when necessary.public PType<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> getType()
SourcePType for this source.public Converter<?,?,?,?> getConverter()
SourceConverter used for mapping the inputs from this instance
 into PCollection or PTable values.getConverter in interface Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>public PTableType<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> getTableType()
getTableType in interface TableSource<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>public long getSize(org.apache.hadoop.conf.Configuration configuration)
SourceSource.public long getLastModifiedAt(org.apache.hadoop.conf.Configuration configuration)
SourceSource was most recently
 modified (e.g., because an input file was edited or new files were added to
 a directory.)getLastModifiedAt in interface Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>public Iterable<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> read(org.apache.hadoop.conf.Configuration conf) throws IOException
ReadableSourceIterable that contains the contents of this source.read in interface ReadableSource<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>conf - The current Configuration instanceSource as an Iterable instanceIOExceptionpublic void configureSource(org.apache.hadoop.mapreduce.Job job,
                            int inputId)
                     throws IOException
SourceconfigureSource in interface Source<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>job - The job to configureinputId - For a multi-input job, an identifier for this input to the jobIOExceptionpublic ReadableData<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> asReadable()
asReadable in interface ReadableSource<Pair<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>ReadableData instance containing the data referenced by this
 ReadableSource.Copyright © 2017 The Apache Software Foundation. All rights reserved.