Search code examples
apache-kafkahdfstransformationapache-kafka-connectconfluent-platform

Include the key from a Kafka message with a connect sink HDFS connector


I'm using the Kafka connect HDFS sink connector to write to HDFS from kafka, it is working fine. My messages look like this:

key: my-key
value: {
"name": "helen"
}

My use case is that in need to append the keys of my message to the events i send to HDFS.

The problem is that the key doesn't appear in the value payload so that i cannot use:

"partitioner.class": 
"io.confluent.connect.hdfs.partitioner.FieldPartitioner", 
"partition.field.name": "key", 

My question is how can I add the key to the message I send to the HDFS or how can i partioned based on the key?


Solution

  • Out of the box, you can't (same goes for S3 Connect), just based on the way the code is written, not a limitation of the Connect framework

    Edit - For the S3 sink, at least, I think there is now a property to include the keys

    At the very least, you would need to build and add this SMT to your Connect workers, which will "move" the key, topic, and partition all over into the "value" of the Connect record before writing to storage

    https://github.com/jcustenborder/kafka-connect-transform-archive