I'm currently doing benchmarks for my studies using YCSB on an ArangoDB cluster (v3.7.3), that I set up using the starter (here).
I'm trying to understand if and how a setup like that ( I'm using 4 VMs e.g.) helps with balancing request load? If I have nodes A,B,C and D and I tell YCSB the IP of node A, all the requests go to node A...
That would mean that a cluster is unnecessary if you want to balance request load, wouldn't it? It would just make sense for data replication.
How would I handle the request load then? I'd normally do that in my application, but I can not do that if I use existing tools like YCSB... (or can I?)
Thanks for the help!
I had this problem as well and ended up solving it by standing-up nginx in front of my cluster, providing a stable, language-independent way to distribute query load. I found nginx surprisingly simple to configure, but take a look at the upstream module for more details.
upstream arangodb {
server coord-1.my.domain:8529;
server coord-2.my.domain:8529;
server coord-3.my.domain:8529;
}
server {
listen *:80 default_server;
server_name _; # Listens for ALL hostnames
proxy_next_upstream error timeout invalid_header;
location / {
proxy_pass http://arangodb;
}
}
It's not ideal for everyone but works well for those times when you just need to load-balance and don't want to write a bunch of code (which ends up being quite slow) to resolve coordinators.
I've asked ArangoDB for a native proxy server solution, but I would bet it's low on their to-do list as it could be tricky to support, given the huge number of configuration options.