Search code examples
apache-kafkamicroservicesmessage

Storing chat messages for a messaging microservice


I am making a real-time based chat microservice and this is my first time making such application. I am kind of confused on how to store the conversations because I think storing them in a database (Cassandra or whatever) is not an optimal solution for the long run.

I've read about Apache Kafka and found that this could be a solution to my problem. Thus, I'm not quite sure if I'm there yet. I just need to know If Kafka is enough without the need of an external database (Just talking about the data related to the messages though, not the users or any other type of data I may need). Because I've read that Kafka provides in its topics the option to use a retention time of "-1" which basically means forever, but AS LONG AS THE SERVER IS RUNNING while I'm not sure I can keep the same server running forever, could someone please clarify this for me ?


Solution

  • read about Apache Kafka and found that this could be a solution to my problem.... need to know If Kafka is enough

    For (temporary) storage of the event itself, sure. For message delivery to clients, no.

    Some extreme examples come up immediately when considering using any message broker

    • you have one topic per "chat conversation", and your chat app becomes really popular - you now have hundreds of thousands of topics, which Kafka cannot handle. A database can be filtered/sharded/partitioned by many fields, including user id
    • chat messages must be ordered, therefore you could really one use one partition for all topics ; with a database, you can always order by timestamp at query time
    • If you used a "firehose" approach of one topic with all messages from everybody, then redistribute the data downstream, then infrequent clients are going to lag behind the most frequent user-client because there are more messages to process.

    Most importantly, chat apps are frontend applications; Kafka is a backend technology and doesn't "push" to frontend services. For mobile/browser use-cases, you wouldn't have access to embed a Kafka consumer in them. A REST API layer would have to be added to request messages, just as it would with a database.

    A better design would be to generate message events rather than store message data - E.g. "At time T, user X send 'M' message with content C to location/users L"... A topic with that data does not need to be persisted forever, just long enough to be consumed. And still, with all that information, you then need to consider topic partitioning for scalability, then you can filter and dump that into some more persistent location because your user-facing application wouldn't be consuming every single one of those events.

    I'm not sure I can keep the same server running forever

    Someone will have to. Use a hosted solution, if you cannot.