Search code examples
bashtimetimeoutexit-codecheck-mk

GNU time returns different signal than it prints out


While running cronjobs and using mk-job from check_mk to monitor its result, I've stumbled across this:

bash:

$ /usr/bin/time -f "time_exit: %x" timeout -s SIGKILL 2s sleep 10; echo "shell_exit: $?"
Command terminated by signal 9
time_exit: 0
shell_exit: 137

The exit code returned from /usr/bin/time differs from the exit code it writes to a formatted output:

time_exit != shell_exit

why?

But when using the default SIGHUP signal, exit codes match:

$ /usr/bin/time -f "time_exit: %x" timeout -s SIGHUP 2s sleep 10; echo "shell_exit: $?"
Command exited with non-zero status 124
time_exit: 124
shell_exit: 124

In the meanwhile I will use timeout -k 10s 2s ... which will first send SIGHUP and after 10s a SIGKILL, if the process was still running. In the hope that SIGHUP would properly stop it.

Background

check_mk provides mk-job to monitor job executions. mk-job uses time to record execution times AND exit code.

man time:

The time command returns when the program exits, stops, or is terminated by a signal. If the program exited normally, the return value of time is the return value of the program it executed and measured. Otherwise, the return value is 128 plus the number of the signal which caused the program to stop or terminate.

man timeout:

... It may be necessary to use the KILL (9) signal, since this signal cannot be caught, in which case the exit status is 128+9 rather than 124.


Solution

  • GNU time's %x only makes sense when the process exits normally, not killed by signals.

    [STEP 101] $ /usr/bin/time -f "%%x = %x" bash -c 'exit 2'; echo '$? = '$?
    Command exited with non-zero status 2
    %x = 2
    $? = 2
    [STEP 102] $ /usr/bin/time -f "%%x = %x" bash -c 'exit 137'; echo '$? = '$?
    Command exited with non-zero status 137
    %x = 137
    $? = 137
    [STEP 103] $ /usr/bin/time -f "%%x = %x" bash -c 'kill -KILL $$'; echo '$? = '$?
    Command terminated by signal 9
    %x = 0
    $? = 137
    [STEP 104] $
    

    For time timeout -s SIGKILL 2s sleep 10, timeout exits normally with 137, it's not killed by SIGKILL, just like bash -c 'exit 137' in my example.


    UPDATE:

    Took a look at time's source code and found out %x is blindly calling WEXITSTATUS() no matter the process exits normally or not.

    655             case 'x':           /* Exit status.  */
    656               fprintf (fp, "%d", WEXITSTATUS (resp->waitstatus));
    657               break;
    

    In the Git master it added new %Tx:

    549             case 'T':
    550               switch (*++fmt)
    551                 {
    ...
    575                 case 'x': /* exit code IF terminated normally */
    576                   if (WIFEXITED (resp->waitstatus))
    577                     fprintf (fp, "%d", WEXITSTATUS (resp->waitstatus));
    578                   break;
    

    And from the Git master's time --help output:

      ...
    
      %Tt  exit type (normal/signalled)
      %Tx  numeric exit code IF exited normally
      %Tn  numeric signal code IF signalled
      %Ts  signal name IF signalled
      %To  'ok' IF exited normally with code zero
    
      ...