Cloud-to-on-premises communication scaling in RMQ

We have an application that consists of two parts: a cloud backend API and an on-premises service that the client installs. The communication between the service and the client is done via a RMQ server, to which the backend connects via AMQP and the client - via MQTT.

The typical usage goes like this:

The caller invokes the API via HTTPS
The API finds the appropriate service
The API puts the request to a permanent downstream queue for this particular client
The API creates a temporary upstream queue for this particular request, binds it to a MQTT topic for responses and starts listening to it
The client receives the message from a MQTT topic and takes up to 10 seconds to process it
The API receives the messages, closes the temporary queue, and returns the result in the HTTPS response

Everything happens in the context of a single long-running HTTP request.

The problem is that this approach does not scale well. When doing a load test with a couple hundred users sending requests simultaneously the RMQ server at some point stops creating new channels, either by timeout:

System.TimeoutException: The operation has timed out.
    at RabbitMQ.Client.Impl.ModelBase.ModelRpc(MethodBase method, ContentHeaderBase header, Byte[] body)
    at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.CreateModel()

Or an outright failure:

RabbitMQ.Client.Exceptions.ChannelAllocationException: The connection cannot support any more channels. Consider creating a new connection
    at RabbitMQ.Client.Impl.SessionManager.Create()
    at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.CreateModel()

The channel limit exception can (theoretically) be overcome by having multiple connections, but it will not help with the timeouts. Neither of the resources available on my PC (CPU time, memory) are being exhausted.

I also considered having one upstream queue for all responses from a single client, but it's not clear how the API will be able to separate results for each request.

Is there a better approach?

Solution

We found 2 approaches and experimented with them:

1. Using Direct Reply-To

Key points are:

The same channel (and therefore the connection) must be used for sending the message and reading the response
Response reading should start before sending the request
Does not work with MQTT plugin, so both parties must connect via AMQP

2. Using a single reader

Key points:

All responses are routed to a single queue, from which a single consumer reads
The consumer puts read data into some sort of cache (e.g. in-memory, Redis, database) where results can be queried by the operation ID

In the end we decided on using the second approach, because being able to communicate via a MQTT connection is a dealbreaker for the project.