Search code examples
kubernetesrabbitmqkubernetes-helmbitnamirabbitmq-shovel

Rabbit MQ Shovel Plugin- Creating duplicate data in case of node failure


I am creating shovel plugin in rabbit mq, that is working fine with one pod, However, We are running on Kubernetes cluster with multiple pods and in case of pod restart, it is creating multiple instance of shovel on each pod independently, which is causing duplicate message replication on destination.

Detail steps are below

  1. We are deploying rabbit mq on Kubernetes cluster using helm chart.

  2. After that we are creating shovel using Rabbit MQ Management UI. Once we are creating it from UI, shovels are working fine and not replicating data multiple time on destination.

  3. When any pod get restarted, it create separate shovel instance. That start causing issue of duplicate message replication on destination from different shovel instance.

  4. When we saw shovel status on Rabbit MQ UI then we found that, there are multiple instance of same shovel running on each pod.

  5. When we start shovel from Rabbit MQ UI manually, then it will resolved this issue and only once instance will be visible in UI.

So problem which we concluded that, in case of pod failure/restart, shovel is not able to sync with other node/pod, if any other shovel is already running on node/pod. Since we are able to solve this issue be restarting of shovel form UI, but this not a valid approach for production. This issue we are not getting in case of queue and exchange.

Can anyone help us here to resolve this issue.


Solution

  • as we lately have seen similar problems - this seems to be an issue since some 3.8. version - https://github.com/rabbitmq/rabbitmq-server/discussions/3154

    it should be fixed as far as I have understood from version 3.8.20 on. see

    https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.19 and https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.20 and https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.9.2

    didn't have time yet to check if this is really fixed with those versions.