KafkaInputFormat (Apache Crunch 0.15.0 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
- - org.apache.crunch.kafka.inputformat.KafkaInputFormat

All Implemented Interfaces:

org.apache.hadoop.conf.Configurable
```
public class KafkaInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
implements org.apache.hadoop.conf.Configurable
```
Basic input format for reading data from Kafka. Data is read and maintained in its pure byte form and wrapped inside of a BytesWritable instance. Populating the configuration of the input format is handled with the convenience method of writeOffsetsToConfiguration(Map, Configuration). This should be done to ensure the Kafka offset information is available when the input format creates its splits and readers. To suppress warnings generated by unused configs in the ConsumerConfig, one can use tagExistingKafkaConnectionProperties and generateConnectionPropertyKey to prefix Kafka connection properties with "org.apache.crunch.kafka.connection.properties" to allow for retrieval later using getConnectionPropertyFromKey and filterConnectionProperties.

Constructor Summary

Constructors
Constructor and Description

KafkaInputFormat()

Constructors
Constructor and Description
`KafkaInputFormat()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)`
`static Properties`	`filterConnectionProperties(Properties props)` Filters out Kafka connection properties that were tagged using `generateConnectionPropertyKey`.
`org.apache.hadoop.conf.Configuration`	`getConf()`
`static Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>>`	`getOffsets(org.apache.hadoop.conf.Configuration configuration)` Reads the `configuration` to determine which topics, partitions, and offsets should be used for reading data.
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)`
`void`	`setConf(org.apache.hadoop.conf.Configuration configuration)`
`static Properties`	`tagExistingKafkaConnectionProperties(Properties connectionProperties)` Generates a `Properties` object containing the properties in `connectionProperties`, but with every property prefixed with "org.apache.crunch.kafka.connection.properties".
`static void`	`writeConnectionPropertiesToBundle(Properties connectionProperties, FormatBundle bundle)` Writes the Kafka connection properties to the `bundle`.
`static void`	`writeOffsetsToBundle(Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets, FormatBundle bundle)` Writes the start and end offsets for the provided topic partitions to the `bundle`.
`static void`	`writeOffsetsToConfiguration(Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets, org.apache.hadoop.conf.Configuration config)` Writes the start and end offsets for the provided topic partitions to the `config`.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - KafkaInputFormat
```
public KafkaInputFormat()
```
- Method Detail
  - getSplits
```
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                       throws IOException,
                                                              InterruptedException
```
    Specified by:
    
    getSplits in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
    
    Throws:
    
    IOException
    
    InterruptedException
  - createRecordReader
```
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                                                                                          org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                                                                                   throws IOException,
                                                                                                                                          InterruptedException
```
    Specified by:
    
    createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>
    
    Throws:
    
    IOException
    
    InterruptedException
  - setConf
```
public void setConf(org.apache.hadoop.conf.Configuration configuration)
```
    Specified by:
    
    setConf in interface org.apache.hadoop.conf.Configurable
  - getConf
```
public org.apache.hadoop.conf.Configuration getConf()
```
    Specified by:
    
    getConf in interface org.apache.hadoop.conf.Configurable
  - writeOffsetsToBundle
```
public static void writeOffsetsToBundle(Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets,
                                        FormatBundle bundle)
```
    Writes the start and end offsets for the provided topic partitions to the bundle.
    
    Parameters:
    
    offsets - The starting and ending offsets for the topics and partitions.
    
    bundle - the bundle into which the information should be persisted.
  - writeOffsetsToConfiguration
```
public static void writeOffsetsToConfiguration(Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> offsets,
                                               org.apache.hadoop.conf.Configuration config)
```
    Writes the start and end offsets for the provided topic partitions to the config.
    
    Parameters:
    
    offsets - The starting and ending offsets for the topics and partitions.
    
    config - the config into which the information should be persisted.
  - getOffsets
```
public static Map<org.apache.kafka.common.TopicPartition,Pair<Long,Long>> getOffsets(org.apache.hadoop.conf.Configuration configuration)
```
    Reads the configuration to determine which topics, partitions, and offsets should be used for reading data.
    
    Parameters:
    
    configuration - the configuration to derive the data to read.
    
    Returns:
    
    a map of TopicPartition to a pair of start and end offsets.
    
    Throws:
    
    IllegalStateException - if the configuration does not have the start and end offsets set properly for a partition.
  - writeConnectionPropertiesToBundle
```
public static void writeConnectionPropertiesToBundle(Properties connectionProperties,
                                                     FormatBundle bundle)
```
    Writes the Kafka connection properties to the bundle.
    
    Parameters:
    
    connectionProperties - the Kafka connection properties
    
    bundle - the bundle into which the information should be persisted.
  - tagExistingKafkaConnectionProperties
```
public static Properties tagExistingKafkaConnectionProperties(Properties connectionProperties)
```
    Generates a Properties object containing the properties in connectionProperties, but with every property prefixed with "org.apache.crunch.kafka.connection.properties".
    
    Parameters:
    
    connectionProperties - the properties to be prefixed with "org.apache.crunch.kafka.connection.properties"
    
    Returns:
    
    a Properties object representing Kafka connection properties
  - filterConnectionProperties
```
public static Properties filterConnectionProperties(Properties props)
```
    Filters out Kafka connection properties that were tagged using generateConnectionPropertyKey.
    
    Parameters:
    
    props - the properties to be filtered.
    
    Returns:
    
    the properties containing Kafka connection information that were tagged using generateConnectionPropertyKey(String).

Class KafkaInputFormat

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

KafkaInputFormat

Method Detail

getSplits

createRecordReader

setConf

getConf

writeOffsetsToBundle

writeOffsetsToConfiguration

getOffsets

writeConnectionPropertiesToBundle

tagExistingKafkaConnectionProperties

filterConnectionProperties