Search code examples
basherror-handlingdebian

Error Handling for Several Commands in Bash Script


I'm looking for an elegant way to do error handling on multiple commands in a BASH script. I am controlling a thermostat in a house. I have a cron job that (after sanity checks) turns off my a/c during summer months based on our power company's time-of-use rates. If the checks pass, it sends a command to my thermostat to turn a/c off, and touches a file at /~/therm/.therm-tod-override-enabled (not actually using ~). The script I'm working on now runs at the end of peak TOU rates, and checks for the override file's existence and to make sure the A/C is still off (so someone can still turn a/c back on if needed). If both conditions are true, it turns the A/C back on and removes the file. If either is not true, it does nothing.

I'm having an issue with the file getting removed but the A/C not turning back on. It may be traced to a cheap usb-to-serial that occasionally throws errors, but I've replaced it with a better one and still get sporadic results. Sometimes the XML status from the thermostat doesn't pull correctly or is corrupted so xmlstarlet can't read it.

I'd like to have the initial XML read try up to 3 times before entering the xmlstarlet parsing, and if reading or parsing fails send a message to the Debian error log. My control program "omnistat" will exit non-zero if anything is wrong in reading or writing to the serial port. I'm just starting to learn bash script, but this is what I have now (over-commented and directory changed to ~/therm):

#!/bin/bash

# Runs from cron only M-F May-September at 7:02p.  If watchfile found and mode still off, have thermostat resume cooling.

# Poll the thermostat, format the result in XML, and drop the XML in a file
~/therm/omnistat -d /dev/ttyUSB0 -x > ~/therm/therm.xml
# Pull thermostat mode from XML, values can be Heat, Cool, Off
thermode=$(xmlstarlet sel -t -v "/therm/name_mode" ~/therm/therm.xml)
# Check for existing override file generated by turn-off script, value is 1 or 0.
# Thermostat program does the actual file check for other unrelated PHP use-case reasons.
isoverride=$(xmlstarlet sel -t -v "/therm/tod_override" ~/therm/therm.xml)
   if [[ "$thermode" == "Off" && "$isoverride" == "1" ]]; then
# Set mode back to Cool
     ~/therm/omnistat -d /dev/ttyUSB0 -M C
# Remove the override-enabled touch file
     rm ~/therm/.therm-tod-override-enabled
   fi

In my last attempt at this before I was doing XML parsing, I found an ugly but functional way to do just the thermostat error checking. It was:

loopcount=0
exterrlev=44
while [ $exterrlev -ne 0 ] && [ $loopcount -lt 4 ]; do
    ~/therm/omnistat -d /dev/ttyUSB0 -M O
    exterrlev=$?
    loopcount=$((loopcount+1))
    done

But considering that now I want to check errors before the xmllint parsing (and break the script with an error before continuing), then check errors during xmllint parsing (and break the script with an error before continuing), and check errors on the mode set (and report an error to the debian error log if it exists), I figure there has to be a far more elegant way to do this.


Solution

  • I'd like to have the initial XML read try up to 3 times before entering the xmlstarlet parsing, and if reading or parsing fails send a message to the Debian error log.

    You have at least two separate cases where you want to retry a command up to three times, until it succeeds or you run out of tries. Writing a shell function for that will make the main script clearer and allow you to avoid repeating yourself. There are many alternatives for that, but here's a pretty simple one:

    try3() {
        "$@" || "$@" || "$@"
    }
    

    You just put try3 in front of the command you want potentially to retry. It will return either a success status or the (failure) status of the third attempt.

    You also have several places where you want to log an error message and exit in the event of failure. A shell function can ease that, too:

    fail() {
      [[ $# -gt 0 ]] && logger "$*"
      exit 1
    }
    

    With those, you can do something along these lines:

    # Try up to three times to read the thermostat; log an error and exit if unsuccessful
    try3 ~/therm/omnistat -d /dev/ttyUSB0 -M O ||
      fail thermostat read failed
    
    # Extract the thermostat mode; log an error and exit if the command fails
    thermode=$(xmlstarlet sel -t -v "/therm/name_mode" ~/therm/therm.xml) ||
      fail thermostat mode parse failed
    
    # Extract the thermostat override status; log an error and exit if the command fails
    isoverride=$(xmlstarlet sel -t -v "/therm/tod_override" ~/therm/therm.xml) ||
      fail thermostat override parse failed
    
    # Validate the mode and override state; log an error and exit if invalid
    case "${thermode}${isoverride}"
      Heat[01]|Cool[01]|Off[01]) ;;
      *) fail incorrect mode "(${thermode})" or override "(${isoverride})" ;;
    esac
    
    # If the thermostat is off and overridden then turn it back on.
    # Log an error and exit if unsuccessful.
    if [[ "$thermode" == "Off" ]] && [[ "$isoverride" == "1" ]]; then
      # Set mode back to Cool
      try3 ~/therm/omnistat -d /dev/ttyUSB0 -M C ||
        fail failed to re-enable A/C
      
      rm ~/therm/.therm-tod-override-enabled ||
        fail failed to remove override file
    fi
    

    You will note considerable use there of the || operator. That's key to one of the conventional error-handling idioms: command-that-may-fail || do-this-on-error. The command on the left is executed, and then if and only if it exits with a failure status, the command on the right is executed.

    Note also the modification to syntax of the if command.