Named pipe stops working

I have two shell scipts: carrier.sh and pie.sh and two named pipes: Answer and Query in a folder. carrier.sh:

while read a
do
if [ "$a" = 'exit' ] ; then exit 0
fi
echo $a
#echo $a >> troll.txt
done

pie.sh:

( ./carrier.sh > Answer ) < Query &
echo 'Foo' > Query
read x < Answer
echo $x
echo 'Boo' > Query
read x < Answer
echo $x
echo 'exit' > Query

Let's try:

$ ./pie.sh
Foo

...

<-And it waits here for typing instead of printing Boo and closing with new prompt. When I add the commented line it works ok. Why is that? I think it may be due to blocking, buffering or non-flushing in pipes. My original intention was communication with mysql-server from a shell script just in one session (and there's the same problem) but for it I could get answers like "use php" :p. What could I change in my code just only in pie.sh?

Solution

A FIFO is a peculiar creature. Once the only process with it open for writing exits, the reader gets EOF and has to reopen the pipe for reading to get any more input. In your scripts, therefore, carrier.sh exits after reading and echoing the Foo. Your pie.sh attempts to open the FIFO for writing again, and it gets hung waiting for a non-existent reader to open it once more.

I modified carrier.sh to read:

while read a
do
    if [ "$a" = 'exit' ] ; then exit 0
    fi
    echo $a
    #echo $a >> troll.txt
done

echo "$0: out of here" >&2

The output (on a Mac running macOS High Sierra 10.13.1 and an ancient Bash 3.25) was:

$ bash pie.sh
./carrier.sh: out of here
Foo
pie.sh: line 6: Answer: Interrupted system call
Foo

and the output froze.

How can we fix it? It is actually quite tricky. Nominally, 'all' that's required is to tell the carrier.sh which FIFO it is meant to read from and which it is meant to write to, so it can open them each time around. That means you need code like this:

carrier.sh

input=$1
output=$2

while true
do
    echo "$0: ($$) about to enter inner loop" >&2
    while read a
    do
        if [ "$a" = 'exit' ]
        then
            echo "$0: ($$) exit" >&2
            exit 0
        fi
        echo $a > $output
        echo "$0: ($$) $a echoed back to standard output" >&2
    done < $input
    echo "$0: ($$) inner loop complete" >&2
done

echo "$0: ($$) out of here" >&2

pie.sh

#( ${SHELL:-bash} ./carrier.sh > Answer ) < Query &

rm -f Query Answer
mkfifo Query Answer

( ${SHELL:-bash} ./carrier.sh Query Answer ) &

echo "$0 ($$): at work"
echo 'Foo' > Query
read x < Answer
echo $x
echo 'Boo' > Query
read x < Answer
echo $x
echo 'exit' > Query
echo "$0 ($$): finished"

And an example run:

$ bash pie.sh
pie.sh (65389): at work
./carrier.sh: (65392) about to enter inner loop
./carrier.sh: (65392) Foo echoed back to standard output
./carrier.sh: (65392) inner loop complete
./carrier.sh: (65392) about to enter inner loop
Foo
./carrier.sh: (65392) Boo echoed back to standard output
./carrier.sh: (65392) inner loop complete
Boo
./carrier.sh: (65392) about to enter inner loop
pie.sh (65389): finished
./carrier.sh: (65392) exit
$ ps
  PID TTY           TIME CMD
  782 ttys000    0:03.59 -bash
  798 ttys001    0:00.03 -bash
  821 ttys002    0:03.27 -bash
$

Warning!

I got some modestly bizarre behaviour while testing (earlier versions of) the code with various processes hanging around unexpectedly. Versions of the carrier script that I'd edited minutes before continued to respond, which confused me. It's why there's a ps listing in the output; it shows that the ttys002 shell which I was using to test has no children left. Contrast with an earlier (confusing) state where I got:

$ ps
  PID TTY           TIME CMD
  782 ttys000    0:03.59 -bash
  798 ttys001    0:00.03 -bash
  821 ttys002    0:03.20 -bash
65214 ttys002    0:00.00 bash pie.sh
65258 ttys002    0:00.01 ksh -x carrier.sh
65296 ttys002    0:00.00 /bin/bash -x carrier.sh
65304 ttys002    0:00.00 /bin/bash carrier.sh
65316 ttys002    0:00.00 bash pie.sh
$ kill 65316 65304 65296 65258 65214
$

That mess is part of the reason the debug output from carrier.sh includes the PID — when I saw messages without the PID after I'd edited the script to include it, I eventually clued into this problem. Interrupts didn't just kill everything. Quite how/why 65316 survived my interrupt, I'm not sure; ditto 65216. The various incarnations of carrier.sh are perhaps less surprising. Just make sure your test environment is clean before running the tests.

Another possible way to 'fix' the problem would be for pie.sh to launch a script that opens the FIFOs appropriately but then sleeps (without reading or writing). It would have to be run in the background. This keeps the FIFOs open, and the main processes can work more freely. The background process would be killed by pie.sh as it exits. If you investigate this, you need to think carefully about whether the background process opens the FIFOs for reading, for writing, or both. I've not explored the ins and outs of this, but it should work — but if you try it, be cautious about your setup. (The hard part is making sure that open operations complete; an open for reading won't complete until there's a writer, and an open for writing won't complete until there's a reader.) Make sure you don't have stray processes hanging around unexpectedly.