I am given X RabbitMQ queues. Some of the queues contain duplicate messages (message is stored in queue A as well as in queue B for example).
I am trying to achiveve one thing: process all the messages from "input" queues (I made a consumer that connects to these queues), remove duplicate messages on the go and send the result data to one output queue.
What would be the fastest and most efficient way to do this?
As far as I know AMQP message_id property is optional, so I have to implement some kind of comparing "seen" messages to the newly arrived ones to achieve my goal.
Hashing message bodies came to my mind, but as I am relatively new to algorithms I am not sure which function to use and what to focus on.
I ended up hashing the message body using SHA1 and storing hash of seen messages. Messages that have not been seen are forwarded to result queue, already seen are discarded.