cluster-computing load-balancing arangodb

ArangoDB Request Load balancing

I'm currently doing benchmarks for my studies using YCSB on an ArangoDB cluster (v3.7.3), that I set up using the starter (here).

I'm trying to understand if and how a setup like that ( I'm using 4 VMs e.g.) helps with balancing request load? If I have nodes A,B,C and D and I tell YCSB the IP of node A, all the requests go to node A...

That would mean that a cluster is unnecessary if you want to balance request load, wouldn't it? It would just make sense for data replication.

How would I handle the request load then? I'd normally do that in my application, but I can not do that if I use existing tools like YCSB... (or can I?)

Thanks for the help!

Solution

I had this problem as well and ended up solving it by standing-up nginx in front of my cluster, providing a stable, language-independent way to distribute query load. I found nginx surprisingly simple to configure, but take a look at the upstream module for more details.

upstream arangodb {
  server coord-1.my.domain:8529;
  server coord-2.my.domain:8529;
  server coord-3.my.domain:8529;
}

server {
  listen                *:80 default_server;
  server_name           _; # Listens for ALL hostnames
  proxy_next_upstream   error timeout invalid_header;
  
  location / {
    proxy_pass          http://arangodb;
  }
}

It's not ideal for everyone but works well for those times when you just need to load-balance and don't want to write a bunch of code (which ends up being quite slow) to resolve coordinators.

I've asked ArangoDB for a native proxy server solution, but I would bet it's low on their to-do list as it could be tricky to support, given the huge number of configuration options.