apache-kafka kafka-consumer-api kafka-producer-api

Get the best out of kafka-node

I need a recommendation of what are the correct Kafka structures to use for my project and why.

My project Im creating a platform for investment bots management. Very highlevel - you can code several investment strategies, and upload them to the platform and they will execute in real time, providing analytics and real time info on performance. The strategies are fed information from 4 streams of data. This data is passed to the strategies when they read from 4 different Kafka topics. This kafka topics receive the information directly from the exchanges websocket. There is a dynamic number of bots in the platform at any given time.

What I have done is the following: Used the image Kafka-wurmeister and zookeper to initialise kafka Initialise all the Kakfka topics I will need beforehand. I push the required data to Kafka by producing all the information to the topics with:

payloads = [
    { topic: topic, messages: JSON.stringify(message), partition: 0 }
]
await producer.send(payloads, async function (err, data) {
})

I then have the strategies read from the topics through a simple consumer, like so: consumer = new Consumer(client, [{ topic: topic, partition: 0 }]); consumer.on('message', function (message) {

    // Parse the value consumed from kafka 
    parsedPrice = JSON.parse(message.value)
 })

the objective is to discuss how I can use kafka to ensure I can, first access the topics from several different consumers and secondly theres enough redundancy to ensure I have a very high uptime.

Solution

If you want to access one topic from several consumers you could set up a consumer group which is where one or more consumers work together to consume a topic.

In a consumer group, each consumer only consumes a particular partition. For example if you have 4 partitions you can use 4 consumers so that 1 partition is mapped to 1 consumer (or you could have for example 2 consumers consuming 2 partitions each). In this scenario, each consumer is only consuming some of the messages, not all of the messages. This allows consumers to scale so they can consume topics with a large number of messages and if one consumer fails the other consumers in the group will rebalance the partitions to take over for the failed consumer.

If you wanted each consumer to consume all messages, you can have these consumers in separate consumer groups – it works just like the above but the consumer is consuming the whole topic. In order to make Kafka highly available, you could then have a failover consumer. This consumer would be idle and would only start to consume messages if the original consumer failed.

This link explains in a little more detail: https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html

Hope this helps!