Azure Load Test. Understanding how virtual users effect performance

I’m trying to make sense of the results of a recent Azure cloud load test, that we ran against one of our API’s

As I’m testing an API, I’ve configured the load test virtual users to have no think time. So essentially, every time a virtual user receive a response, it sends another request straight away.

We are also not using any kind of user sessions, or caching any data per user. It's a basic test, that posts some JSON to an endpoint on the API, which then performs some calculations of the data it received.

It appears that by changing the amount of virtual users, we can make the service more performant. By which I mean, it can respond faster and still process more requests per second.

The results of two load tests are shown below.

The first test, tells me that our API is capable of processing 60k requests in 2 minutes.

What I can't understand, is why adding more virtual users, increases the average response time and lowers the RPS, which in turn causes the API to only process 55k requests in 2 minutes.

Why would the API now only be able to handle 460 RPS, when we already know it can handle 500 RPS?

Solution

There are 3 questions here: 1.why more virtual users increase response time; 2.why more VU decrease RPS; 3.why more VU decrease total requests.

Here are the explanations:

More concurrent VUs create more concurrent sessions that require more resources on the server (e.g. session context, size of queues, thread concurrency) that increase server processing time and response time from the client perspective.
Decreasing RPS would be inconsistent in this situation only if the load generator issues requests with constant frequency inconsequential of receiving responses. In reality, after issuing a request each VU waits until the response is received. Since the server response got slower, the wait time increased, which causes decrease in RPS. There is a 2nd answer to this question. Because the load generator performance capacity is limited, emulating more VUs requires more resources on the client which can cause delays in issuing requests. While you configure your test with zero think time, the load generator can inadvertently inject delays that cause additional RPS decrease.
The number of total requests is proportionate to the number of VUs and RPS. Apparently in your case RPS reduction had a bigger impact, and the number of total requests diminished.

Generally, effect of decreasing RPS caused by increasing VU in load testing looks like a paradox, when in reality it is not.