Search code examples
shellnamed-pipes

Named pipe stops working


I have two shell scipts: carrier.sh and pie.sh and two named pipes: Answer and Query in a folder. carrier.sh:

while read a
do
if [ "$a" = 'exit' ] ; then exit 0
fi
echo $a
#echo $a >> troll.txt
done

pie.sh:

( ./carrier.sh > Answer ) < Query &
echo 'Foo' > Query
read x < Answer
echo $x
echo 'Boo' > Query
read x < Answer
echo $x
echo 'exit' > Query

Let's try:

$ ./pie.sh
Foo

...

<-And it waits here for typing instead of printing Boo and closing with new prompt. When I add the commented line it works ok. Why is that? I think it may be due to blocking, buffering or non-flushing in pipes. My original intention was communication with mysql-server from a shell script just in one session (and there's the same problem) but for it I could get answers like "use php" :p. What could I change in my code just only in pie.sh?


Solution

  • A FIFO is a peculiar creature. Once the only process with it open for writing exits, the reader gets EOF and has to reopen the pipe for reading to get any more input. In your scripts, therefore, carrier.sh exits after reading and echoing the Foo. Your pie.sh attempts to open the FIFO for writing again, and it gets hung waiting for a non-existent reader to open it once more.

    I modified carrier.sh to read:

    while read a
    do
        if [ "$a" = 'exit' ] ; then exit 0
        fi
        echo $a
        #echo $a >> troll.txt
    done
    
    echo "$0: out of here" >&2
    

    The output (on a Mac running macOS High Sierra 10.13.1 and an ancient Bash 3.25) was:

    $ bash pie.sh
    ./carrier.sh: out of here
    Foo
    pie.sh: line 6: Answer: Interrupted system call
    Foo
    

    and the output froze.

    How can we fix it? It is actually quite tricky. Nominally, 'all' that's required is to tell the carrier.sh which FIFO it is meant to read from and which it is meant to write to, so it can open them each time around. That means you need code like this:

    carrier.sh

    input=$1
    output=$2
    
    while true
    do
        echo "$0: ($$) about to enter inner loop" >&2
        while read a
        do
            if [ "$a" = 'exit' ]
            then
                echo "$0: ($$) exit" >&2
                exit 0
            fi
            echo $a > $output
            echo "$0: ($$) $a echoed back to standard output" >&2
        done < $input
        echo "$0: ($$) inner loop complete" >&2
    done
    
    echo "$0: ($$) out of here" >&2
    

    pie.sh

    #( ${SHELL:-bash} ./carrier.sh > Answer ) < Query &
    
    rm -f Query Answer
    mkfifo Query Answer
    
    ( ${SHELL:-bash} ./carrier.sh Query Answer ) &
    
    echo "$0 ($$): at work"
    echo 'Foo' > Query
    read x < Answer
    echo $x
    echo 'Boo' > Query
    read x < Answer
    echo $x
    echo 'exit' > Query
    echo "$0 ($$): finished"
    

    And an example run:

    $ bash pie.sh
    pie.sh (65389): at work
    ./carrier.sh: (65392) about to enter inner loop
    ./carrier.sh: (65392) Foo echoed back to standard output
    ./carrier.sh: (65392) inner loop complete
    ./carrier.sh: (65392) about to enter inner loop
    Foo
    ./carrier.sh: (65392) Boo echoed back to standard output
    ./carrier.sh: (65392) inner loop complete
    Boo
    ./carrier.sh: (65392) about to enter inner loop
    pie.sh (65389): finished
    ./carrier.sh: (65392) exit
    $ ps
      PID TTY           TIME CMD
      782 ttys000    0:03.59 -bash
      798 ttys001    0:00.03 -bash
      821 ttys002    0:03.27 -bash
    $
    

    Warning!

    I got some modestly bizarre behaviour while testing (earlier versions of) the code with various processes hanging around unexpectedly. Versions of the carrier script that I'd edited minutes before continued to respond, which confused me. It's why there's a ps listing in the output; it shows that the ttys002 shell which I was using to test has no children left. Contrast with an earlier (confusing) state where I got:

    $ ps
      PID TTY           TIME CMD
      782 ttys000    0:03.59 -bash
      798 ttys001    0:00.03 -bash
      821 ttys002    0:03.20 -bash
    65214 ttys002    0:00.00 bash pie.sh
    65258 ttys002    0:00.01 ksh -x carrier.sh
    65296 ttys002    0:00.00 /bin/bash -x carrier.sh
    65304 ttys002    0:00.00 /bin/bash carrier.sh
    65316 ttys002    0:00.00 bash pie.sh
    $ kill 65316 65304 65296 65258 65214
    $
    

    That mess is part of the reason the debug output from carrier.sh includes the PID — when I saw messages without the PID after I'd edited the script to include it, I eventually clued into this problem. Interrupts didn't just kill everything. Quite how/why 65316 survived my interrupt, I'm not sure; ditto 65216. The various incarnations of carrier.sh are perhaps less surprising. Just make sure your test environment is clean before running the tests.

    Another possible way to 'fix' the problem would be for pie.sh to launch a script that opens the FIFOs appropriately but then sleeps (without reading or writing). It would have to be run in the background. This keeps the FIFOs open, and the main processes can work more freely. The background process would be killed by pie.sh as it exits. If you investigate this, you need to think carefully about whether the background process opens the FIFOs for reading, for writing, or both. I've not explored the ins and outs of this, but it should work — but if you try it, be cautious about your setup. (The hard part is making sure that open operations complete; an open for reading won't complete until there's a writer, and an open for writing won't complete until there's a reader.) Make sure you don't have stray processes hanging around unexpectedly.