Search code examples
javaapache-kafkahazelcastigniteinfinispan

KAFKA compared to modern In Memory Memory Data Grid (IMDG)


I have some IMDG experience I am rather new to KAFKA. I am trying to understand the use case for Kafka. I understand it is a streaming/messaging platform. A lot of its issues have some contra parts in the modern In Memory Data Grids. Can you shed a bit light over the use cases when someone would prefer to use Kafka and when you would prefer to use IMDG. I need to draw a parallel.

I will give you one example. I have noticed usage of Kafka for data replication. Although possible I feel that IMDG are more capable and automated for this purpose.

Also I am interested in how these two technologies compliment each other as I don't think they are in direct competition.


Solution

  • The two types of systems do have some feature overlap, but they still are two different types of systems with dissimilar primary objectives. In that we can't compare them on the primary feature of either.

    Kafka is primarily a pub/sub durable message broker. Data grids are primarily in-memory cache systems. This is the first distinction or key attribute on which one would choose to use either.

    On a secondary level, which I believe is where the lines become blurred, both types of system provide some kind of distributed computing capabilities (Kafka Streams, Ignite or Hazelcast compute grid/service) with data ingestion functionality. This, however, cannot be taken as the primary selection criterion.

    The two types don't really directly compete with one another on their respective primary purpose. A stream-based compute engine may use a data grid for computation or for transient state caching, but I don't see how it would rely on compute/data grids for a reliable, standalone message broker as it would depend on something like Kafka for it.

    A small application may dispense with one type to use the secondary features of the other, but an application with high demand for both may in fact need to use both types of systems.

    As an example, if you're building a high-volume data pipeline with multiple data sources and you have to use a durable message broker, you will probably have to use Kafka, but if you equally have strong requirements for low-latence querying downsream, you will as well need to use a compute grid, be it for caching or for distributed computing.