Search code examples
pythonparallel-processingrabbitmqdistributed-computingdistributed

RabbitMQ - Basic_qos function and how to maximize it


I have been using RabbitMQ for building a distributed web crawler. So far, I have been using this function so that each only gets 1 request at a time.

channel.basic_qos(prefetch_count=1)

For what I understand, no matter how many queues the channel has, it will always process just 1 message at a time.

Is there a way so that I can maximize the number of messages processed at a time? I didn't want to make this prefetch_count static but instead to process as many messages as my computer can at a given time.


Solution

  • If you don't specify a prefetch (qos) then RabbitMQ will send your consumer as many messages as the connection can handle. So just don't call basic_qos at all.

    To maximise the throughput of your connection, don't send an ack per message but use the basic_ack with multiple=true, and acknowledge large batches of messages at a time.

    This comes with risk. If your connection dies, you'll have a load of messages redelivered and if you batch your acks, you'll reprocess a whole lot of messages too. But if your consumer is a web crawler then the worst that could happen is that it crawls a site or page twice so no big deal.