Load generated by distributed worker processes is not equivalent generated by single process

I start another test trying to figure how users are allocated to worker node.

Here is my locust file.

@task
def mytask(self):
    self.client.get("/")


class QuickstartUser(HttpUser):
    wait_time = between(1, 2)
    tasks = [mytask]

It is nothing but access a Chinese search engine website that never failed to visit.

When I start 30 users running in single node, and RPS is 20.

locust -f try_baidu.py

and got running status and result as below.

I switch to distributed running mode using command in 3 terminal of my computer.

locust -f try_baidu.py --master #for master node
locust -f try_baidu.py --worker --master-host=127.0.0.1 #for worker node each

and I input same amount of users and hatch rate in locust UI as above, say 30 users and hatch rate 10.

I've got same RPS which is 20 or around, and each worker node runs 15 users.

This explains that number of user input in UI is total amount to simulated and dispersed around worker node. It is something like load balance to burden load generation.

But I don't know why same amount of users gives 2 different RPS when running in single node (Scenario 1) and distributed (Scenario 2). They shall be result into same or closed RPS as above test.

The only difference I can tell is above comparison is in same computer while Scenario 2 have worker nodes in 2 remote linux VMs. But is it real reason?

Question may be asked not very clearly and I add some testing result here trying to depict what I have when running distributed and in single node with specified users.

Scenario 1: Single Node

Scenario 2: Trying to simulate 3 worker process each of which running 30 users but get lower RPS even.

from console I can see that each worker process starts 30 users as expected but have no idea why RPS is only 1/3 or single node.

Scenario 3: Adding triple times users to 90 for each worker process and get almost same RPS as running in single node.

It seems Scenario 3 is what I expected for triple simulation amount. But why locust graphic panel gives each worker process is running 90 users?

Scenario 4: To make sure locust truly distribute users specified to each worker node, I put 30 users for single worker node and get the same RPS as single node (not distributed)

Do I have to adding up total users distributed to worker node and input this total amount?

Solution

The problem is solved to some extend by set master node to another system. I guess there should be something interferes master to collect requests sent from one of workers which results into half of expected RPS. This may suggest a possible solution to lower RPS than it should be.