Search code examples
pythonjsonapache-kafkaaws-lambdaaws-msk

Kafka sends value as a string: How can I deserialize it and turn it into a JSON object with Python


I am trying to consume MSK (managed Amazon Kafka service) messages from a lambda function - MSK is the trigger of my lambda.

The producer looks like this:

data = {'time': 1611215510000000000, 'tags': {'tag1': 'tagvalue'}, 'fields': {'value': 12345}}
self.producer = KafkaProducer(
            security_protocol=self.security_protocol,
            bootstrap_servers=self.kafka_servers,
            value_serializer=lambda x: dumps(x).encode('utf-8'))
self.producer.send(kafka_topic, value=data)

In the lambda function, I receive the following:

{
   "eventSource":"aws:kafka",
   "eventSourceArn":"<arn....>",
   "bootstrapServers":"<serverlist...>",
   "records":{
      "topic-0":[
         {
            "topic":"topic",
            "partition":0,
            "offset":0,
            "timestamp":1611138328871,
            "timestampType":"CREATE_TIME",
            "value":"eyJ0aW1lIjogMTYxMTEzODI4MDAwMDAwMDAwMCwgInRhZ3MiOiB7InN0YXR1cyI6ICJHb29kIn0sICJmaWVsZHMiOiB7InZhbHVlX251bSI6IDAuMCwgInZhbHVlIjogZmFsc2V9fQ=="
         },
         {
            "topic":"topic",
            "partition":0,
            "offset":1,
            "timestamp":1611138330033,
            "timestampType":"CREATE_TIME",
            "value":"eyJ0aW1lIjogMTYxMTEzODI4MDAwMDAwMDAwMCwgInRhZ3MiOiB7InN0YXR1cyI6ICJHb29kIn0sICJmaWVsZHMiOiB7InZhbHVlIjogMTQxMzUuMH19"
         }
    ]
  }
}

I'd like to transform the value strings to JSON objects. How could I do it? I've tried a lot of versions, the one I thought should work throws an exception (Exception: Expecting value: line 1 column 1 (char 0))

records = event['records']['topic-0']
for record in records:
    print(json.loads(record['value']).decode('utf-8'))

Solution

  • The value strings seem base64 encoded, so you need to find a way to decode them. Then you can load them.

    The first string decoded using https://www.base64decode.org/:

    {"time": 1611138280000000000, "tags": {"status": "Good"}, "fields": {"value_num": 0.0, "value": false}}