python parallel-processing rabbitmq distributed-computing distributed

RabbitMQ - Basic_qos function and how to maximize it

I have been using RabbitMQ for building a distributed web crawler. So far, I have been using this function so that each only gets 1 request at a time.

channel.basic_qos(prefetch_count=1)

For what I understand, no matter how many queues the channel has, it will always process just 1 message at a time.

Is there a way so that I can maximize the number of messages processed at a time? I didn't want to make this prefetch_count static but instead to process as many messages as my computer can at a given time.

Solution

If you don't specify a prefetch (qos) then RabbitMQ will send your consumer as many messages as the connection can handle. So just don't call basic_qos at all.

To maximise the throughput of your connection, don't send an ack per message but use the basic_ack with multiple=true, and acknowledge large batches of messages at a time.

This comes with risk. If your connection dies, you'll have a load of messages redelivered and if you batch your acks, you'll reprocess a whole lot of messages too. But if your consumer is a web crawler then the worst that could happen is that it crawls a site or page twice so no big deal.