Search code examples
amazon-dynamodbapache-flinkamazon-dynamodb-streams

Consume DynamoDB streams in Apache Flink


Has anyone tried to consume DynamoDB streams in Apache Flink ?

Flink has a Kinesis consumer. But I am looking for how can i consume the Dynamo stream directly.

DataStream<String> kinesis = env.addSource(new FlinkKinesisConsumer<>(
    "kinesis_stream_name", new SimpleStringSchema(), consumerConfig));

I tried searching a lot, but did not find anything. However found an open request pending the Flink Jira board. So I guess this option is not available yet ? What alternatives do I have ?

Allow FlinkKinesisConsumer to adapt for AWS DynamoDB Streams


Solution

  • UPDATED ANSWER - 2019

    FlinkKinesisConsumer connector can now process a DynamoDB stream after this JIRA ticket is implemented.

    UPDATED ANSWER

    It seems that Apache Flink does not use the DynamoDB stream connector adapter, so it can read data from Kinesis, but it can't read data from DynamoDB.

    I think one option could be implement an app that would write data from DynamoDB streams to Kinesis and then read data from Kinesis in Apache Flink and process it.

    Another option would be to implement custom DynamoDB connector for Apache Flink. You can use existing connector as a starting point.

    Also you can take a look at the Apache Spark Kinesis connector. But it seems that it has the same issue as well.

    ORIGINAL ANSWER

    DynamoDB has a Kinesis adaptor that allow you to consume a stream of DynamoDB updates using Kinesis Client Library. Using Kinesis adaptor is a recommended way (according to AWS) of consuming updates from DynamoDB. This will give you same data as using DynamoDB stream directly (also called DynamoDB low-level API).