Search code examples
shellpipeproducer-consumer

Asynchronously consuming pipe with bash


I have a bash script like this

data_generator_that_never_guits | while read data 
do
 an_expensive_process_with data
done

The first process continuously generates events (at irregular intervals) which needs to be processed as they become available. A problem with this script is that read on consumes a single line of the output; and as the processing is very expensive, I'd want it to consume all the data that is currently available. On the other side, the processing must start immediately if a new data becomes available. In the nutshell, I want to do something like this

data_generator_that_never_guits | while read_all_available data 
do
 an_expensive_process_with data
done

where the command read_all_available will wait if no data is available for consumption or copy all the currently available data to the variable. It is perfectly fine if the data does not consist of full lines. Basically, I am looking for an analog of read which would read the entire pipe buffer instead of reading just a single line from the pipe.

For the curious among you, the background of the question that I have a build script which needs to trigger a rebuild on a source file change. I want to avoid triggering rebuilds too often. Please do not suggest me to use grunt, gulp or other available build systems, they do not work well for my purpose.

Thanks!


Solution

  • I think I have found the solution after I got better insight how subshells work. This script appears to do what I need:

    data_generator_that_never_guits | while true 
    do
     # wait until next element becomes available
     read LINE
     # consume any remaining elements — a small timeout ensures that 
     # rapidly fired events are batched together
     while read -t 1 LINE; do true; done
     # the data buffer is empty, launch the process
     an_expensive_process
    done
    

    It would be possible to collect all the read lines to a single batch, but I don't really care about their contents at this point, so I didn't bother figuring that part out :)

    Added on 25.09.2014

    Here is a final subroutine, in case it could be useful for someone one day:

    flushpipe() {
     # wait until the next line becomes available
     read -d "" buffer
     # consume any remaining elements — a small timeout ensures that 
      # rapidly fired events are batched together
     while read -d "" -t 1 line; do buffer="$buffer\n$line"; done
     echo $buffer   
    }
    

    To be used like this:

    data_generator_that_never_guits | while true 
    do
     # wait until data becomes available
     data=$(flushpipe)
     # the data buffer is empty, launch the process
     an_expensive_process_with data
    done