Search code examples
spring-bootspring-cloudspring-kafkaspring-cloud-streamspring-cloud-stream-binder-kafka

Multiple ProducerIds are created when there are multiple instances of Producers


In the .yaml file, we have set

spring.cloud.stream.kafka.binder.configuration.enable.idempotence as true.

Now when the application starts up, we can see a log like

[kafka-producer-network-thread | test_clientId] org.apache.kafka.clients.producer.internals.TransactionManager - [Producer clientId=test_clientId] ProducerId set to 0 with epoch 0

When the first message is being produced to the topic, we can see that another ProducerId is being used as shown in the below log

[Ljava.lang.String;@720a86ef.container-0-C-1] org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=test_clientId] Instantiated an idempotent producer.
[Ljava.lang.String;@720a86ef.container-0-C-1] org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.0.1
[Ljava.lang.String;@720a86ef.container-0-C-1] org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : fa14705e51bd2ce5
kafka-producer-network-thread | test_clientId] org.apache.kafka.clients.Metadata - Cluster ID: -9nblycHSsiksLIUbVH6Vw
-9nblycHSsiksLIUbVH6Vw
1512361 INFO [kafka-producer-network-thread | test_clientId] org.apache.kafka.clients.producer.internals.TransactionManager - [Producer clientId=test_clientId] ProducerId set to 1 with epoch 0

Once the ProducerId is set to 1, when any new messages are sent from this application, no new ProducerIds are created.

But if we have multiple applications running(all connecting to same kafka server), then new ProducerIds are created in that instance also while starting up as well as while sending the first message.

Please suggest if we can restrict creating of new ProducerIds and use the same one that was created while creating the application. Also, since a lot of ProducerIds are created, is there some way in which we can re-use the already created ones?(Assuming the application has multiple producers and each one creates multiple ProducerIds)


Solution

  • The first producer is temporary - it is created to find the existing partitions for the topic during initialization. It is immediately closed.

    The second producer is a single producer used for subsequent record sends.

    The producerId and epoch are allocated by the broker. They have to be unique.

    With a new broker you will get 0 and 1 for the first instance, 2 and 3 for the second instance, 4 and 5, ...

    Even if you stop all instances, the next one will get 7 and 8.

    Why do you worry about this?

    On the other hand, if you set the client.id to, say foo, you will always get foo-1 and foo-2 on all instances.