amazon-web-services distributed-computing ejabberd riak amazon-elb

CPU and memory utilization discrepancies for ejabberd and Riak clusters on AWS

I'm running a 2-node ejabberd cluster (behind an elastic load balancer) that in turn connects with a 3-node Riak cluster (again, via an ELB) on AWS. When I load-test the platform via Tsung (creating 0.5 million user registrations), I notice that the CPU utilization for the ejabberd nodes differs amongst themselves by around 10%. For the Riak nodes, the CPU and memory utilization amongst nodes differs by around 5%.

The nodes are of identical configuration, so wondering what could be leading to these non-trivial differences in utilization. Can anyone throw some light here please / educate me?

Is it due to the load balancer? Or a network impact? I expect that once a cluster is formed (either of ejabberd or of Riak KV), the nodes are all identical in behavior, especially for ejabberd where the entire database is replicated across the cluster.

Not that these differences are a problem, but would be good to understand the inner workings of the clusters here...

Many thanks.

Solution

Elastic Load Balancing mechanism

DNS server uses DNS round robin to determine which load balancer node in a specific Availablity Zone will receive the request
The selected load balancer checks for "sticky session" cookie
The selected load balancer sends the request to the least loaded instance

And in greater details:

Availability Zones (unlikely your case)

By default, the load balancer node routes traffic to back-end instances within the same Availability Zone. To ensure that your back-end instances are able to handle the request load in each Availability Zone, it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a.

Sessions (most likely your case)

By default a load balancer routes each request independently to the server instance with the smallest load. By comparison, a sticky session binds a user's session to a specific server instance so that all requests coming from the user during the session will be sent to the same server instance.

AWS Elastic Beanstalk uses load balancer-generated HTTP cookies when sticky sessions are enabled for an application. The load balancer uses a special load balancer–generated cookie to track the application instance for each request. When the load balancer receives a request, it first checks to see if this cookie is present in the request. If so, the request is sent to the application instance specified in the cookie. If there is no cookie, the load balancer chooses an application instance based on the existing load balancing algorithm. A cookie is inserted into the response for binding subsequent requests from the same user to that application instance. The policy configuration defines a cookie expiry, which establishes the duration of validity for each cookie.

Routing Algorithm (less likely your case)

Load balancer node sends the request to healthy instances within the same Availability Zone using the leastconns routing algorithm. The leastconns routing algorithm favors back-end instances with the fewest connections or outstanding requests.

Source: Elastic Load Balancing Terminology And Key Concepts

Hope it helps.