Search code examples
apache-camelapache-kafka-streams

Understanding the difference between Apache camel and Kafka Stream


Being quite familiar with Apache Camel, I am a new bee in Kafka Streams. I am learning Kafka streams, but could not find any relevant answer for the below query,

Being a library both Camel and Kafka Streams can create pipelines to extract data, polishing/transforming and load into some sink using a processor. Camel also supports stream processing. I want to understand the

  • difference between these two since I feel Camel library to be more generic than Kafka Stream which is not relevant for systems where there is no Kafka broker (no sure if this is wrong)
  • which library is recommended for which type of use case

Thanks in advance.


Solution

  • Kafka Streams is a stream processing framework, that consumes messages from Kafka topics and writes them back to other Kafka topics. It brings support for stateful transformations such as aggregations to tables and similar, leveraging RocksDB, when necessary. You can provide Rest endpoints to such tables/stores, but that is already extending the Kafka Streams features.

    Another possible extension is, to send messages somewhere else than Kafka. You will have to provide the client to do so yourself. With that regards, Kafka Streams' scope is much less versatile than Apache Camel. Because of that specialisation, it supports various Kafka specific features, such as parallel processing based on Kafka consumer groups, predefined message envelopes and exactly once semantics. One of the most important feature is the support of "stream time" in Kafka streams, which allows reprocessing of messages by their Kafka timestamps regardless of the Wall-Clock-Time.

    You can have a look on KSQL, which is build on top of Kafka Streams, to get an idea, what is possible to build with Kafka Streams.

    In short, if you have data in Kafka, that you want to process and write back to Kafka for other programs to consume, Kafka Streams is a very helpful framework. It even has a similar deployment model as Apache Camel. However, if you need to integrate different technologies with Kafka, you need to stay with Apache Camel. Note, there is Kafka Connect in the Apache Kafka family, that is geared towards the integration of data from other systems with Apache Kafka.