R and GNU Parallel - How to limit number of cores used

(New to GNU Parallel)

My aim is to run the same Rscript, with different arguments, over multiple cores. My first problem is to get this working on my laptop (2 real cores, 4 virtual), then I will port this over to one with 64 cores.

Currently:

I have a Rscript, "Test.R", which takes in arguments, does a thing (say adds some numbers then writes it to a file), then stops.

I have a "commands.txt" file containing the following:

/Users/name/anaconda3/lib/R/bin/Rscript Test.R 5 100 100
/Users/name/anaconda3/lib/R/bin/Rscript Test.R 5 100 1000
/Users/name/anaconda3/lib/R/bin/Rscript Test.R 5 100 1000
/Users/name/anaconda3/lib/R/bin/Rscript Test.R 5 100 1000
/Users/name/anaconda3/lib/R/bin/Rscript Test.R 50 100 1000
/Users/name/anaconda3/lib/R/bin/Rscript Test.R 50 200 1000

So this tells GNU parallel to run Test.R using R (I have installed this using anaconda)

In the terminal (after navigating to the desktop which is where Test.R and commands.txt are) I use the command:

parallel --jobs 2 < commands.txt

What I want this to do, is to use 2 cores, and run the commands, from commands.txt, until all tasks are complete. (I have tried variations on this command, such as changing the 2 to a 1, in this case, 2 of the cores run at 100%, and the other 2 run around 20-30%).

When I run this, all of the 4 cores go to 100% (as seen from htop), and the first 2 jobs complete, and no more jobs get complete, despite all 4 cores still being at 100%.

When I run the same command on the 64 core compute, all 64 cores go to 100%, and I have to cancel the jobs.

Any advice on resources to look at, or what I am doing wrong would be greatly appreciated.

Bit of a long question, let me know if I can clarify anything.

The output from htop as requested, during running the above command (sorted by CPU%:

   1  [||||||||||||||||||||||||100.0%]   Tasks: 490, 490 thr; 4 running
   2  [|||||||||||||||||||||||||99.3%]   Load average: 4.24 3.46 4.12 
   3  [||||||||||||||||||||||||100.0%]   Uptime: 1 day, 18:56:02
   4  [||||||||||||||||||||||||100.0%]
   Mem[|||||||||||||||||||5.83G/8.00G]
   Swp[||||||||||          678M/2.00G]

   PID USER      PRI  NI  VIRT   RES S CPU% MEM%   TIME+  Command
  9719 user     16   0 4763M  291M ? 182.  3.6  0:19.74 /Users/user/anaconda3
  9711 user     16   0 4763M  294M ? 182.  3.6  0:20.69 /Users/user/anaconda3
  7575 user     24   0 4446M 94240 ? 11.7  1.1  1:52.76 /Applications/Utilities
  8833 user     17   0 86.0G  259M ?  0.8  3.2  1:33.25 /System/Library/StagedF
  9709 user     24   0 4195M  2664 R  0.2  0.0  0:00.12 htop
  9676 user     24   0 4197M 14496 ?  0.0  0.2  0:00.13 perl /usr/local/bin/par

Solution

Based on the output from htop the script /Users/name/anaconda3/lib/R/bin/Rscript uses more than one CPU thread (182%). You have 4 CPU threads and since you run 2 Rscripts we cannot tell if Rscript would eat all 4 CPU threads if it ran by itself. Maybe it will eat all CPU threads that are available (your test on the 64 core machine suggests this).

If you are using GNU/Linux you can limit which CPU threads a program can use with taskset:

taskset 9 parallel --jobs 2 < commands.txt

This should force GNU Parallel (and all its children) to only use CPU thread 1 and 4 (9 in binary: 1001). Thus running that should limit the two jobs to run in two threads only.

By using 9 (1001 binary) or 6 (0110 binary) we are reasonably sure that the two CPU threads are on two different cores. 3 (11 binary) might refer to the two threads on the came CPU core and would therefore probably be slower. The same goes for 5 (101 binary).

In general you want to use as many CPU threads as possible as that will typically make the computation faster. It is unclear from your question why you want to avoid this.

If you are sharing the server with others a better solution is to use nice. This way you can use all the CPU power that others are not using.