Search code examples
bashshellwhile-loopsubshell

Bash while loop stops after first iteration when subshell is called


This contrived bash script demonstrates the issue.

#!/bin/bash
while read -r node ; do
    echo checking $node for Agent;
       PID=$(ssh $node ""ps -edf | grep [j]ava | awk '{print $2}'"")
       echo $PID got to here.
done < ~/agents_master.list

agents_master.list contains 1 server per line:

server1
server2
server3

Which only outputs the following:

checking server1 for Agent
Authorized use only
25176 got to here

Server 2 and 3 aren't even echoed out to screen by the line echo checking $node...

If I comment out the line PID=$(.... then the while completes the whole agents_master.list file correctly...

checking server1 for Agent
got to here
checking server2 for Agent
got to here
checking server3 for Agent
got to here

From the googling I've done, it sounds like this is related to the subshell that $(...) creates, but I don't understand why it is causing the loop to stop at the first server, server1.

Yes, this code could be re-written but I'm keen to understand this behaviour of bash and why this is happening for future.


Solution

  • The problem -- one of the problems -- is that ssh is forwarding stdin to the remote server. As it happens, the command you are running on the remote server (ps -edf, see below) doesn't use its standard input, but ssh will still forward what it reads, just in case. As a consequence, nothing is left for read to read, so the loop ends.

    To avoid that, use ssh -n (or redirect input to /dev/null yourself, which is what the -n option does).

    There are a couple of other issues which are not actually interfering with your scripts execution.

    First, I have no idea why you use "" in

    ssh $node ""ps -edf | grep [j]ava | awk '{print $2}'""
    

    The "" "expands" to an empty string, so the above is effectively identical to

    ssh $node ps -edf | grep [j]ava | awk '{print $2}'
    

    that means that the grep and awk commands are being run on the local host; the output from the ps command is forwarded back to the local host by ssh. That doesn't change anything, although it does make the brackets in [j]ava redundant, since the grep won't show up in the process list, as it is not running on the host where the ps is executed. In fact, it's a good thing that the brackets are redundant, since they might not be present in the command if there happens to be a file named java in your current working directory. You really should quote that argument.

    I presume that what you intended was to run the entire pipeline on the remote machine, in which case you might have tried:

    ssh $node "ps -edf | grep [j]ava | awk '{print $2}'"
    

    and found that it didn't work. It wouldn't have worked because the $2 in the awk command will be expanded to whatever $2 is in your current shell; the $2 is not protected by interior single-quotes. As far as bash is concerned, $2 is just part of a double quoted string. (And it also would shift the issue of the argument to grep not being quoted to the remote host, so you'll have problems if there is a file named java in the home directory on the remote host.

    So what you actually want is

    ssh -n $node 'ps -edf | grep "[j]ava" | awk "{print \$2}"'
    

    Finally, don't use PID as the name of a shell variable. Variable names in all upper case are generally reserved, and it is perilously close to BASHPID and PPID, which are specific bash variables. Your own shell variables should have lower-case names, as in any other programming language.