I have a scenario in my script where I am running few sem
commands simultaneously. Here i am running 1000 sem commands simultaneously.
filename: sem_script.sh
#/usr/bin/bash
fun() {
#dosomething with the $param
echo $1
}
export -f fun
sem --id someid --fg fun $param
The reason i am using sem
is i want fun
to run one after another
so if i do
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
...
..
... more than 1000 times
sh sem_script.sh "test" &
sh sem_script.sh "test" &
sh sem_script.sh "test" &
then it will output
test
test
test
test
test
test
...
..
... more than 1000 times
test
test
test
But problem here it opens 1000 sem commands at a time and they are waiting in queue to run one after another. This chokes up my cpu and my ram and everything jams.
So i decided that I dont want to allow more than 4 sem commands to be in queue for a particular id
here someid
What is want is as below:
#/usr/bin/bash
fun() {
#dosomething with the $param
echo $1
}
export -f fun
num_sem_instances = get how many sem instances are running with id someid
if(num_sem_instances < 4), then {
#allow to create a sem instance
sem --id someid --fg fun $param
}
else {
#dont create an sem instance
echo "already have 4 instances of sem with id=someid"
# rerun the script again and try your luck
sh sem_script.sh "test" &
}
Because in bash when scripts execute simultenously the above logic may not work. It will work when there is some time delay in scripts
Better than the above logic i strongly prefer that is there an option in sem
command which will only allow it to have 4
instances of an id someid
running at any point of time on my pc and rest it will not allow to run.
When a sem is running, it adds a pidfile to ~/.parallel/semaphones/id-someid/, so you should be able to count files here that have pids.
i just ran sem --id someid -j2 sleep 10
twice in cli and listed out the contents of that directory:
[user@laptop ~]$ ls -lah .parallel/semaphores/id-someid/
total 8.0K
drwxrwxr-x. 2 user user 4.0K Jul 9 09:47 .
drwxrwxr-x. 3 user user 4.0K Jul 9 09:47 ..
-rw-rw-r--. 3 user user 0 Jul 9 09:47 19428@laptop.wks
-rw-rw-r--. 3 user user 0 Jul 9 09:47 19449@laptop.wks
-rw-rw-r--. 3 user user 0 Jul 9 09:47 id-someid
so in your script, i would put
num_sem_instances = $(find ~/.parallel/semaphores/id-${YOURID}/ -type f 2> /dev/null | awk -F/ '{print $NF}' | grep ^[0-9] | wc -l)
EDIT:
If only one sem
can run at a time (i.e. -j1
), and only four instances of the command can be queued at once, the sem could be wrapped in another parallel process, which queues up the task only after counting the queued commands:
fun () { echo $1; sleep 1 }
runfun () {
numqueued=$(find ~/.parallel/semaphores/id-queued/ -type f 2> /dev/null | awk -F/ '{print $NF}' | grep ^[0-9] | wc -l)
if [ $numqueued < 4 ]; then
parallel -j4 --bg --id queued sem --id funid --fg fun $1
else
echo "too much fun right now"
fi
}
export -f fun
runfun $1