GNU parallel saturates one server instead of distributing jobs equally

I am using GNU parallel 20160222. I have four servers configured in my ~/.parallel/sshloginfile:

48/big1
48/big2
8/small1
8/small2

when I run, say, 32 jobs, I'd expect parallel to start eight on each server. Or even better, two or three each on small1 and small2, and twelve or so each on big1 and big2. But what it is doing is starting 8 jobs on small2 and the remaining jobs locally.

Here is my invocation (I actually use a --profile but I removed it for simplicity):

parallel --verbose --workdir . --sshdelay 0.2 --controlmaster --sshloginfile .. \
    "my_cmd {} | gzip > {}.gz" ::: $(seq 1 32)

Here is the main question:

Is there an option missing that would do a more equal allocation of jobs?

Here is another related question:

Is there a way to specify --memfree, --load, etc. per server? Especially --memfree.

Solution

I recall GNU Parallel used to fill job slots "from one end". This did not matter if you had way more jobs than job slots: All job slots (both local and remote) would fill up.

It did, however, matter if you had fewer jobs. So it was changed, so GNU Parallel today gives jobs to sshlogins in a round robin fashion - thus spreading it more evenly.

Unfortunately I do not recall which version this change was done. But your can tell if you version does it by running:

parallel -vv -t

and look at which sshlogin is being used.

Re: --memfree

You can build your own using --limit.

I am curious why you want different limits for different servers. The idea behind --memfree is that it is set to the amount of RAM that a single job takes. So if there is enough RAM for a single job, a new job should be started - no matter the server.

You clearly have another situation, so explain about that.

Re: upgrading

Look into parallel --embed.