Search code examples
apache-kafkaiot

Dealing with Kafka Producer connection loss


This is not so much of a coding question per se, but more of a architecture design for real-time streaming application. We have the following setup:

  • Multiple embedded IoT devices in the field (so low memory, but option to have some extended local storage)
  • They all are streaming their data in real-time to a Kafka cluster, acting as producers and then we have post-processing applications that act as consumers and help store the data in a database.
  • Now sometimes these IoT devices would loose connection to one of the nodes in the Kafka cluster, since the network connections in the field are not always reliable. These sort of disconnections could last up to a day typically.

Now I understand that Kafka takes care of nodes (acting as brokers) failing in the cluster, but what if I have a situation where the producer just does not have a good network connection and just cannot publish its data to the Kafka topic because it cannot see it?

We cannot afford to loose any data, but the good news is that we have expandable storage options for the embedded IoT devices where we could save the data when the IoT device goes offline and then stream it when the connection is back up. Is this something that is recommended with Kafka? In particular I have the following questions:

  1. Does Kafka have a built in way for the producers to have some kind of offline on-disk (NOT in-memory) storage cache?
  2. How does Kafka deal with messages to topics that just cannot be sent, due to network connectivity issues? Is there a way to just schedule them in a queue and then wait until the connection to the cluster is back up?
  3. What kind of local storage options could I use which I can easily interface with as my on-disk cache?
  4. How about having a redundant local time-series database (on the embedded device's storage) just collecting all the data stream and then have an agent take care of the sending of the data to the Kafka cluster, and then clean the database up when it gets an acknowledgement from the Kafka broker?
  5. Are there any other ways to deal with these situations where the Kafka Producers have intermittent connection to the cluster and can just sent the stream data in chunks when it is connected?

Solution

  • Kafka producer doesn't provide offline mode, not is it able to stream data in chunks AFAIK. What I suggest you do is have a callback for the producer send, and on failure, write the content of the message to local storage. Then you should have a background thread that picks all flushed data from local storage and endlessly try to send it using a producer. Basically it's the naive approach for your suggestion with time-series DB on the device. But whether it's FS or DB on the device, that's the only approach to meet your needs.