Search code examples
bashautomationbackground-processtailwatch

Tailing multiple files in the background with bash


I've never written anything this intense in bash. Basically, I want to run a limited number of data import scripts in parallel. To do so, I need to know when one has terminated in order to start the next. However, I'm not sure how to do this in parallel. The following works synchronously:

# watch the outputfile for "DONE!"
tail -f $outputfile | while read OUTPUT
do
  if [[ "${OUTPUT}" == *"DONE!"* ]]
  then
    runNextScript
  fi
done

How can I run this asynchronously?


Solution

  • Basically, I want to run a limited number of data import scripts in parallel. To do so, I need to know when one has terminated in order to start the next.

    One way of doing that is to create a fifo containing as much tokens as the maximum number of concurrent scripts.

    Then, before launching a task, you first consume a token, actually launch the task, and finally put back the token in the fifo. That way, when the maximum number of working script is reached, the next one is blocked until a token is available.

    Not clear? Here is a proof of concept (you definitively have to adapt to your needs!):

    • master.sh
    #!/bin/bash
    
    rm -f fifo
    mkfifo fifo
    
    exec 3<>fifo
    
    # Simulate 26 tasks
    tasks=$(exec echo {a..z})
    
    #insert 5 tokens in the fifo
    #that is at max 5 worker working at the same time
    for i in {1..5}; do
        (echo T >&3; echo Insert token) &
    done
    
    # launch the tasks when a token is available
    for i in $tasks; do
        read <&3
        ( ./worker.sh $i; echo T >&3 ) &
    done
    
    wait
    
    • worker.sh (not much of interest: simulate doing some stuff)

    #!/bin/bash

    # simulate doing some stuff
    S=$(( RANDOM % 10 ))
    echo "$(exec date +%s) PID$$ doing task $1 for $S"
    sleep $S
    

    Here is a transcript of a session:

    sh$ ./master.sh 
    Insert token
    Insert token
    Insert token
    Insert token
    Insert token
    1405456428 PID3039 doing task a for 0
    1405456428 PID3041 doing task b for 0
    1405456428 PID3046 doing task e for 5
    1405456428 PID3043 doing task c for 5
    1405456428 PID3045 doing task d for 8
    1405456428 PID3055 doing task f for 4
    1405456428 PID3057 doing task g for 0
    1405456428 PID3066 doing task h for 6
    1405456432 PID3070 doing task i for 2
    1405456433 PID3074 doing task j for 3
    1405456433 PID3077 doing task k for 0
    1405456433 PID3082 doing task l for 9
    1405456434 PID3086 doing task m for 3
    1405456434 PID3089 doing task n for 5
    1405456436 PID3094 doing task o for 7
    1405456436 PID3097 doing task p for 7
    1405456437 PID3102 doing task q for 2
    1405456439 PID3106 doing task r for 3
    1405456439 PID3109 doing task s for 3
    1405456442 PID3114 doing task t for 7
    1405456442 PID3118 doing task u for 5
    1405456442 PID3121 doing task v for 7
    1405456443 PID3126 doing task w for 9
    1405456443 PID3129 doing task x for 3
    1405456446 PID3134 doing task y for 9
    1405456447 PID3138 doing task z for 1
    

    The total execution time is around 20s, when the total "worked time" by the workers is 113s. If I'm not too wrong, that factor 5 is corresponding to the 5 workers working in parallel.