Search code examples
linuxbashloggingecho

Use of echo >> produces inconsistent results


I've been trying to understand a problem that's cropped up with some of the scripts we use at work.

To generate many of our script logs, we utilize the exec command and file redirects to print all output from the script to both the terminal and a log file. Occasionally, for information that doesn't need to be displayed to the user, we do a straight redirect to the log file.

The issue we're seeing occurs on the last line of output to the file when we're printing the number of errors that occurred during that execution: The text doesn't get printed to the file.

In an attempt to diagnose the problem, I wrote a simplified version of our production script (script1.bash) and a test script (script2.bash) to try to tease out the problem.

script1.bash

#!/bin/bash
log_name="${USER}_`date +"%Y%m%d-%H%M%S"`_${HOST}_${1}.log"
log="/tmp/${log_name}"
log_tmp="/tmp/temp_logs"

err_count=0

finish()
{
   local ecode=0
   if [ $# -eq 1 ]; then
      ecode=${1}
   fi

   # This is the problem line
   echo "Error Count: ${err_count}" >> "${log}"

   mvlog
   local success=$?

   exec 1>&3 2>&4

   if [ ${success} -ne 0 ]; then
      echo ""
      echo "WARNING: Failed to save log file to ${log_tmp}"
      echo ""
      ecode=$((ecode+1))
   fi

   exit ${ecode}
}

mvlog()
{
   local ecode=1

   if [ ! -d "${log_tmp}" ]; then
      mkdir -p "${log_tmp}"
      chmod 775 "${log_tmp}"
   fi

   if [ -d "${log_tmp}" ]; then
      rsync -pt --bwlimit=4096 "${log}" "${log_tmp}/${log_name}" 2> /dev/null
      [ $? -eq 0 ] && ecode=0

      if [ ${ecode} -eq 0 ]; then
         rm -f "${log}"
      fi
   fi
}

exec 3>&1 4>&2 >(tee "${log}") 2>&1

ecode=0

echo
echo "Some text"
echo

finish ${ecode}

script2.bash

#!/bin/bash
runs=10000
logdir="/tmp/temp_logs"

if [ -d "${logdir}" ]; then
   rm -rf "${logdir}"
fi

for i in $(seq 1 ${runs}); do
   echo "Conducting run #${i}/${runs}..."
   ${HOME}/bin/script1.bash ${i}
done

echo "Scanning logs from runs..."

total_count=`find "${logdir}" -type f -name "*.log*" | wc -l`
missing_count=`grep -L 'Error Count:' ${logdir}/*.log* | grep -c /`

echo "Number of runs performed: ${runs}"
echo "Number of log files generated: ${total_count}"
echo "Number of log files missing text: ${missing_count}"

My first test indicated roughly 1% of the time the line isn't written to the log file. I then proceeded to try several different methods of handling this line of output.

  1. Echo and Wait
echo "Error Count: ${err_count}" >> "${log}"
wait
  1. Alternate print method
printf "Error Count: %d\n" ${err_count} >> "${log}"
  1. No Explicit File Redirection
echo "Error Count: ${err_count}"
  1. Echo and Sleep
echo "Error Count: ${err_count}" >> "${log}"
sleep 0.2

Of these, #1 and #2 each had a 1% fail rate while #4 had a staggering 99% fail rate. #3 was the only methodology that had a 0% fail rate.

At this point, I'm at a loss for why this behavior is occurring, so I'm asking the gurus here for any insight.

(Note that the simple solution is to implement #3, but I want to know why this is happening.)


Solution

  • Without testing, this looks like a race condition between your script and tee. It's generally better to avoid multiple programs writing to the same file at the same time.

    If you do insist on having multiple writers, make sure they are all in append mode, in this case by using tee -a. Appends to the local filesystem are atomic, so all writes should make it (this is not necessarily true for networked file systems).