Search code examples
bashmultithreadingfunctionbackgroundmultiprocessing

BASH: Is it possible to multi-thread a function within a single BASH script? If so, how?


In my use case, I've got a single-threaded backup script where I know from other experience the hardware can handle around 30X the I/O bandwidth, and single-threaded it's taking so long to complete it's still running into the morning work period.

The trouble is, just backgrounding the individual I/O bound commands doesn't work because there's follow-on tasks that depend on previous task's results. Most importantly, if rsync reports no updates, skip creating a formal backup file (such as zip or tar), plus a tiny bit of reporting on the matter. There are other cases of order dependency but getting rsync results before moving on in that set of commands is vital.

In the past I've just multi-processed by kicking off a companion script but over time this has proven problematic as editing multiple scripts becomes error-prone as times change and needs shift. So, I'd like to keep it all in one script.

I've noticed that functions can have, according to my read of the documentation anyway, redirected input and output, and that got me thinking about the use of a function or braces grouping and related mechanisms; perhaps there's a way to group these related commands and then background the lot of them?

If so, this would be a "cheap" way of multi-threading a BASH script but even as I type this I'm thinking NO?! It's ONE BASH instance?! However, I know I'm ignorant about many wonderful features of BASH, so maybe?

Note that I've considered tracking pids, and it's on the one hand conceptually straight-forward, but on the other, non-trivial on its own because of the follow-on I/O bound task using the same context (set of arguments). ...If that's what I'd have to do, ... arrays and so forth would be in my future, matching up arguments with pids! ... I might just keep on using a companion script, OR consider writing a short script on the fly in temporary files?! -ugh!- Not looking forward to that, either.

Is there another way?


Solution

  • As noted by tjm3772 in the first comment to my question, the answer to my question is YES; you CAN background a function and, apparently, any set of BASH commands you put inside curly braces, { and }.

    And, as noted by the second commenter, markp-fuso, this effectively forks the running code, thus making it it's own process, and that's how it "multi-threads," by actually multi-processing. So this is, in fact, " a 'cheap' way of multi-threading a BASH script," as I suspected might be possible and suggested in my question.

    Importantly...

    Note that this only works without further complications, as markp-fuso pointed out, if the backgrounded tasks don't need to share any of the results with the "parent" process. This is because the process is forked, and thus it no longer shares context; the variables are no longer shared.

    Thus, if the backgrounded tasks don't need to share any of the results with the "parent" process, it's fine. But otherwise, if further interchange is required, some other mechanism needs to be used and, well, it is no longer "cheap multi-tasking" in my view!

    My first use of this technique has no such requirement and so, when the code ran last night, this reduced the run-time (elapsed "wall-clock" time) from around 9 to 10 hours to about 1 hour four minutes. Not bad!

    My code is FAR too large and complex to post here, but as a partial "concrete example," in pseudo-code:

    function run_rsync()
    {
       source="$1"
       destination="$2"
       local syncLog=$(date +$logPath/%Y%m%d%H%M%S%N)
       rsync $rsync_flags "$source" "$destination" > "$syncLog"
       < ... parse log, note any changes found ... >
       if [ $syncCount -gt 0 ] ;
       then
          run_tar "$source"
       fi
    }
    
    function run_tar()
    {
       source="$1"
       tmp=$(basename "$source")
       tFileName="$tar_dir/$tmp.tar"
       tar $tar_flags "$tFileName" "$source"
    }
    
    if [ -n $skip_sync ] ;
    then
       run_tar "$source" &
    else
       run_rsync "$source" "$destination" &
    fi
    

    I DO have some follow-on questions about the processes so created. For example, do they continue when the "parent" exits? At the moment, I have a 'wait' command that gets executed before exiting.

    For another, I'm curious about redirecting the stdin and stdout of functions - this might well have some interesting applications! But these are also separate issues.