Search code examples
bashsemaphoregnu-parallel

bash sem - limit number of sem commands based on id


I have a scenario in my script where I am running few sem commands simultaneously. Here i am running 1000 sem commands simultaneously.

filename: sem_script.sh

#/usr/bin/bash
fun() {
  #dosomething with the $param
  echo $1
}
export -f fun

sem --id someid --fg fun $param

The reason i am using sem is i want fun to run one after another

so if i do

sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
...
..
... more than 1000 times
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &

then it will output

test
test
test
test
test
test
...
..
... more than 1000 times
test
test
test

But problem here it opens 1000 sem commands at a time and they are waiting in queue to run one after another. This chokes up my cpu and my ram and everything jams.

So i decided that I dont want to allow more than 4 sem commands to be in queue for a particular id here someid

What is want is as below:

#/usr/bin/bash
fun() {
  #dosomething with the $param
  echo $1
}
export -f fun

num_sem_instances = get how many sem instances are running with id someid
if(num_sem_instances < 4), then {
  #allow to create a sem instance 
  sem --id someid --fg fun $param
}
else {
  #dont create an sem instance
  echo "already have 4 instances of sem with id=someid"
  # rerun the script again and try your luck
  sh sem_script.sh "test" &
}

Because in bash when scripts execute simultenously the above logic may not work. It will work when there is some time delay in scripts

Better than the above logic i strongly prefer that is there an option in sem command which will only allow it to have 4 instances of an id someid running at any point of time on my pc and rest it will not allow to run.


Solution

  • When a sem is running, it adds a pidfile to ~/.parallel/semaphones/id-someid/, so you should be able to count files here that have pids.

    i just ran sem --id someid -j2 sleep 10 twice in cli and listed out the contents of that directory:

    [user@laptop ~]$ ls -lah .parallel/semaphores/id-someid/
    total 8.0K
    drwxrwxr-x. 2 user user 4.0K Jul  9 09:47 .
    drwxrwxr-x. 3 user user 4.0K Jul  9 09:47 ..
    -rw-rw-r--. 3 user user    0 Jul  9 09:47 19428@laptop.wks
    -rw-rw-r--. 3 user user    0 Jul  9 09:47 19449@laptop.wks
    -rw-rw-r--. 3 user user    0 Jul  9 09:47 id-someid
    

    so in your script, i would put

    num_sem_instances = $(find ~/.parallel/semaphores/id-${YOURID}/ -type f 2> /dev/null | awk -F/ '{print $NF}' | grep ^[0-9] | wc -l)
    

    EDIT:

    If only one sem can run at a time (i.e. -j1), and only four instances of the command can be queued at once, the sem could be wrapped in another parallel process, which queues up the task only after counting the queued commands:

    fun () { echo $1; sleep 1 }
    
    runfun () {
      numqueued=$(find ~/.parallel/semaphores/id-queued/ -type f 2> /dev/null | awk -F/ '{print $NF}' | grep ^[0-9] | wc -l)
      if [ $numqueued < 4 ]; then
        parallel -j4 --bg --id queued sem --id funid --fg fun $1
      else
        echo "too much fun right now"
      fi
    }
    
    export -f fun
    
    runfun $1