Search code examples
bashsshio-redirectioncacti

Captured output of command on remote host (SSH via Cron) is blank


Below is a script which logs into a remote host (a Cisco IOS-XR router) and runs a single command via SSH. The idea is to grab the result of the command (an integer) so that it can be graphed by Cacti. Cacti runs this script every 5 minutes when it runs it's normal poll routine:

#!/bin/bash

if [[ -z $1 ]]
then
    exit 1
fi

HOST="$1"
USER="cact-ssh-user"
TIMEOUT=10s
export SSHPASS="aaaaaaaaaaaaa"

CMD="show controllers np struct IPV4-LEAF-FAST-P np0 | in Entries"
RAW_OUTPUT=$(timeout $TIMEOUT sshpass -e ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $USER@$HOST "$CMD" 2>/dev/null)
GRT_UCASTV4_USED=$(echo -n "$RAW_OUTPUT" | grep "Entries" | awk '{print $6}' | tr -d "," | tr -d " ")

echo -n "ucastv4_used:$GRT_UCASTV4_USED"

This command works fine via an interactive shell (when I run the script on the Cacti server using /path/to/script/script.sh 10.0.0.1. However when the Cacti cronjob runs the output is simply blank. So on my SSH session to the Cacti server the output is:

$ ./script 10.0.0.1
ucastv4_used:1234

In the Cacti log the output is: 05/22/2017 03:35:21 PM - SPINE: Poller[0] Host[69] TH[1] DS[6837] SCRIPT: /opt/scripts/cacti-scripts/asr9001-get-tcam-ucast-usage.sh 10.0.0.1, output: ucastv4_used:

I have su'ed to the Cacti user and the script works just fine. So this seems to be specific to it runnings as a cronjob, the ouput from the SSH command is being redirected somewhere magically and I don't know where or why.

To try and debug this I have added the following lines to the script (directly under #!/bin/bash) and wait for the Cacti 5 minute poll interval to run (I can see in the Cacti log when the script is called every 5 minutes);

exec >/tmp/stdout.log 2>/tmp/stderr.log
set -x

The stdout.log just contains ucastv4_used: the same as cacti.log and the stderr.log file contains the login banner for the remote SSH host and nothing else. Where has the SSH output gone?

I have tired to change the SSH line in the script to output to a file and then read from there:

timeout $TIMEOUT sshpass -e ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $USER@$HOST "$CMD" > /tmp/output 2>/dev/null
GRT_UCASTV4_USED=$(grep "Entries" /tmp/output | awk '{print $6}' | tr -d "," | tr -d " ")

The file /tmp/output is empty and so the GRT_UCASTV4_USED variable is empty also. stdout.log ends up being the same as before: ucastv4_used:

I have also tried to change #!/bin/bash to #!/bin/bash -i to force an interactive session. This kind of works in that with -i if I add echo $PS1 to the script I can see in the stdout.log file that $PS1 is set and without -i it prints nothing. However there is still no output from the SSH command. Where is the command of the SSH output going?

I have also tried to use ssh ..... | tee /tmp/output so that the output should show up in /tmp/output and /tmp/stdout.log but both are blank.

I can see on the remote router that the SSH session is comming in and running the command. This is from debug ssh server:

RP/0/RSP0/CPU0:May 22 14:52:57.976 UTC: SSHD_[65909]: (open_master_file) command added show controllers np struct IPV4-LEAF-FAST-P np0 | in Entries

Also since this is working via my interactive session with the Cacti server I am guessing the issue is on there and not the router. I am also confident that Cacti it's self is not the problem, I can trigger spine to poll this router host from my interactive SSH session and the script works fine (further pointing to the issue that some how in a non-interactive shell the SSH output is evaporating):

$ cd /usr/local/spine/bin
$ ./spine -V 7 69 69
...
05/22/2017 04:06:56 PM - SPINE: Poller[0] Host[69] TH[1] DS[6837] SCRIPT: /opt/scripts/cacti-scripts/asr9001-get-tcam-ucast-usage.sh 10.0.0.1, output: ucastv4_used:658809

So it seems that the SSH output is being redirected somewhere and I can't "get it" or the router somehow knows this is a non-interactive SSH client and isn't sending anything back. How else can I debug this?

Update 1 Using debug ssh server on the Cisco router I have captured the debug logs when I am running the script via my interactive SSH session to the Cacti server and when it runs via Cacti's poll interval/cron job. I have diff'ed the output and the only interesting looking difference I can find (besides stuff like the SSH PID changing and ephemeral source port of the Cacti server changing etc.) is the following:

*** 132,145 ****
   (sshd_interactive_shell) *** removing alarm
   sshd_interactive_shell - ptyfd = 46
   event_contex_init done
!  sshd_ptytonet - Channel 1 Received EOT (bytes:1)
!  sshd_ptytonet - Channel 1 exec command executed sending CHANNEL_CLOSE
!  (close_channel), pid:182260085, sig rcvd:1, state:10 chan_id:1
!  addrem_ssh_info_tuple: REMOVE Inside the critical Section %pid:182260085
!  Cleanup sshd process 182260085, session id 1, channel_id 1
!  addrem_ssh_info_tuple: REMOVE exiting the Critical Section %pid:182260085
   close_channel: Accounting stopped: scriptaccount
!  In delete channel code, pid:182260085, sig rcvd:1, state:10 chan_id:1
   Sending Exit Status: 0 sig: 1
   Sending Channel EOF msg
   Sending Channel close msg for remote_chan_id = 0 chan_id = 1
--- 134,147 ----
   (sshd_interactive_shell) *** removing alarm
   sshd_interactive_shell - ptyfd = 46
   event_contex_init done
!  Pad_len = 6, Packlen = 12
!  sshd_nettopty: EOF received. Disconnecting session
!  (close_channel), pid:182329717, sig rcvd:1, state:10 chan_id:1
!  addrem_ssh_info_tuple: REMOVE Inside the critical Section %pid:182329717
!  Cleanup sshd process 182329717, session id 1, channel_id 1
!  addrem_ssh_info_tuple: REMOVE exiting the Critical Section %pid:182329717
   close_channel: Accounting stopped: scriptaccount
!  In delete channel code, pid:182329717, sig rcvd:1, state:10 chan_id:1
   Sending Exit Status: 0 sig: 1
   Sending Channel EOF msg
   Sending Channel close msg for remote_chan_id = 0 chan_id = 1

The top half is my interactive session with the Cacti server. I note in the top hald sshd_ptytonet - Channel 1 Received EOT (bytes:1) whereas via the cronjob the debug shows sshd_nettopty: EOF received. Disconnecting session. Is the non-interactive session simply passing my SSH command to the remote host and quiting as quickly as possible (so it's not waiting for the SSH server to respond with the command output)?


Solution

    • First, tell SSH client to not to allocate a PTY with -T option, because obviously cron doesn't have one.
    • Then give it something infinite on stdin, so it will keep running until stdout is open, we have /dev/zero exactly for this purpose.

    RAW_OUTPUT=$(timeout $TIMEOUT sshpass -e ssh -T -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $USER@$HOST "$CMD" </dev/zero 2>/dev/null)