Search code examples
rasa

Rasa Agent does not support more than 23 req/sec


As we were doing a stress testing of a Rasa Agen before deploying on the production, we encountered that it only supports 23 requests per second with a response time of 1 second.

If we try to increase the requests by more than 23, then the response time increases gradually; it becomes more than 5 seconds, regardless of its hardware.

Is there any way to eliminate this limit?

I am on Rasa version 2.1.2


Solution

  • We have done a similar kind of exercise for kAIron, which internally uses rasa 2.1.2 and not able to have more than 23 req/seq; even with our chat server implementation using tornado, which still under development, we can reach 32 req/seq max with response time up to 1 sec.

    In our experience, to achieve more concurrency, one option is to deploy the rasa chat server on Kubernetes with horizontal scaling