Search code examples
bashawkinfinite-looptee

How to write to file the data received in an infinite loop


Is there some way to write to file the data received while in an infinite loop? I have a script that displays web content in my terminal as it appears on the web page. But all my attempts to tee the data have resulted in an empty file. I suppose this is because without ever exiting the loop, there is no opportunity to write anything to file. But I have read about infinite loops filling a hard drive with unwanted data. So it seems like writing the output from a command pipeline should be possible as well.

get_page() {

    osascript -e \
    'tell application "Google_Chrome" to tell window 1 to tell active tab to execute javascript "document.body.innerText"'

}

while get_page | grep -E '[:alnum:]' 
do 
    sleep 1 & 
done < <(get_page) | awk '!x[$0]++'

Note that the only reason this works at all is the awk !x[$0]++ command which (correct me if my explanation is not accurate) reads the input it receives and then removes duplicate lines while also preserving the order of the lines as well. Without that in place, this script would be insane.


Solution

  • A few things:

    1. The loop isn't infinite. It iterates until the getpage function returns non-zero.

    2. You want the loop to execute once a second? In that case, remove the & after the sleep 1 or it will execute much quicker than that! The & puts the sleep process in the background and continues.

    3. You're calling getpage twice. This is probably unintended. I'm not sure what it returns, but you probably want something like the following instead:

      while true; do
        getpage
        sleep 1
      done | awk '!seen[$0]++' | tee output.log
      

    If that still doesn't solve it, it is probably, as pointed out in the comments below, due to buffering done by awk. To force awk to flush its output buffer after each line you can do

    awk '!seen[$0]++ { print; fflush() }'
    

    A slight issue with this is that the awk process will keep a copy of each unique line of input in memory. This will grow as more unique lines are read from the output of getpage.