Search code examples
dockermesos

Where to find more explicit errors given container error status codes?


I am actually running tasks through a Mesos stack, which use Docker containers.

Sometimes, some tasks are failing.

Here are some of the related TaskStatus messages and reasons:

message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED

Is there a table of correspondance that links container error status codes from TaskStatus message with more explicit errors ?


Solution

  • Command tasks could fail for several reasons and set proper exit code. For example Docker 1.10 set exit status codes like this (from documentation and this answer):

    The exit code from docker run gives information about why the container failed to run or why it exited. When docker run exits with a non-zero code, the exit codes follow the chroot standard, see below:

    125 if the error is with Docker daemon itself:

    $ docker run --foo busybox; echo $?
    # flag provided but not defined: --foo   See 'docker run --help'.   
    

    126 if the contained command cannot be invoked:

    $ docker run busybox /etc; echo $?
    # docker: Error response from daemon: Container command '/etc' could not be invoked.   
    

    127 if the contained command cannot be found

    $ docker run busybox foo; echo $?
    # docker: Error response from daemon: Container command 'foo' not found or does not exist.   127 Exit code of contained command
    

    otherwise

    $ docker run busybox /bin/sh -c 'exit 3'; echo $?
    # 3
    

    Another exit code rule could be found here

    | Code  |            Meaning             |         Example         |                                                   Comments                                                   |
    |-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
    | 1     | Catchall for general errors    | let "var1 = 1/0"        | Miscellaneous errors, such as "divide by zero" and other impermissible operations                            |
    | 2     | Misuse of shell builtins       | empty_function() {}     | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
    | 126   | Command invoked cannot execute | /dev/null               | Permission problem or command is not an executable                                                           |
    | 127   | "command not found"            | illegal_command         | Possible problem with $PATH or a typo                                                                        |
    | 128   | Invalid argument to exit       | exit 3.14159            | exit takes only integer args in the range 0 - 255 (see first footnote)                                       |
    | 128+n | Fatal error signal "n"         | kill -9 $PPID of script | $? returns 137 (128 + 9)                                                                                     |
    | 130   | Script terminated by Control-C | Ctl-C                   | Control-C is fatal error signal 2, (130 = 128 + 2, see above)                                                |
    | 255*  | Exit status out of range       | exit -1                 | exit takes only integer args in the range 0 - 255                                                            |
    

    According to your examples:

    If you need more information to explain status code you can check Message field in Mesos TaskStatus update, for example Mesos put there information about OOM. Same information could be also find in Mesos logs. To debug why command returned non zero code you may check files stored in executor sandbox especially stderr/stdout or command specific logs.