We are building a flink application which will be deployed to AWS Kinesis data analytics(KDA). This application will consume from Kafka and write to S3. Our setup is as follows:
We want to do the following:
topic 1
through topic 10
).app 1
through app 5
).app 1
will consume from topic 1
and 2
, app 2
will consume from topic 3
and 4
and so on).topic 4
for example. We will update the config system to point App 4
which is consuming from topic 7
and topic 8
to instead consume from topic 7
and topic 4
.Is there any way to do this? As far my research goes, the only way to make the Flink app to read from a new topic is to redeploy it. But want to check if there is some way some one has figured out.
Conversely: Will this situation be automatically handled if we make all the 5 Flink applications to listen to all the 10 topics? I mean, if there is a sudden surge in one of the topics, will the flink applications rebalance themselves to dedicate more resources to read from the hot topic since they are all part of the same consumer group?
Flink's Kafka consumer does not support stopping consumption from a topic (without a restart), but it does support dynamic topic and partition discovery. See https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kafka/#dynamic-partition-discovery for details.