Search code examples
amazon-s3apache-kafkatransactionsstreamingdistributed-system

Store your events directly from kafka into database?, when or why using S3/HDFS before?


I'm learning about event-streams/event pipelines.

I know how looks a normal (and simple) pipeline, let's say something like this, which is very easy to find in internet:

Kafka-> S3/HDFS/... -> database/datawarehouse

My question is the next, why I don't see this architecture?:

Kafka -> database/datawarehouse

I know why in my company we use S3 to store our events before going to db, but I just want some additional opinion or point of view as I didn't work so much in companies with event-streams pipelines Thanks!


Solution

  • This is one of the architectural diagram I have created:

    enter image description here

    Note: Here, I am pushing data from Kafka to MongoDB, Hive and HBase