Search code examples
twitterapache-kafkatwitter-streaming-api

What would Kafka do if producer goes down?


I'm a bit confused about Kafka architecture. We would like to capture Twitter Streaming API. We came across this https://github.com/NFLabs/kafka-twitter/blob/master/src/main/java/com/nflabs/peloton2/kafka/producer/TwitterProducer.java Twitter Producer.

What I'm thinking about is how to design the system so it's fault tolerant.

If the producer goes down, does it mean we lose some of the data? How to prevent this from happening?


Solution

  • If the producer you linked to stops running, new data from the Twitter API will not make its way into Kafka. I'm not sure how the Twitter Streaming API works, but it may be possible to get historic data, allowing you to fetch all data back to the point when the producer failed.

    Another option is to use Kafka Connect, which is a distributed, fault tolerant service for connecting data sources and sinks to Kafka. Connect exposes a higher-level API and uses the out-of-the-box producer/consumer API behind the scenes. The documentation explains Connect very thoroughly, so give that a read and go from there.