Search code examples
dockerdocker-composedocker-stack

How to change Docker stack restarting behaviour?


In our project we inherited Docker environment with some service stack in it.

I've noticed Docker restarting stack once it faces memory limit.

Unfortunately, I haven't found any info according to my questions on the Docker's website, so I'm asking here:

  1. Is this behaviour configurable? For instance, I don't want Docker to restart my stack under any circumstances. If it is configurable, then how?
  2. Is there any docker journal to keep any stack restarts as it's entries?

Solution

    1. Is this behaviour configurable? For instance, I don't want Docker to restart my stack under any circumstances. If it is configurable, then how?

    With a version 3 stack, the restart policy moved to the deploy section:

    version: '3'
    services:
      crash:
        image: busybox
        command: sleep 10
        deploy:
          restart_policy:
            condition: none
            # max_attempts: 2
    

    Documentation on this is available at: https://docs.docker.com/compose/compose-file/#restart_policy

    1. Is there any docker journal to keep any stack restarts as it's entries?

    Depending on the task history limit (configurable with docker swarm update, you can view the previously run tasks for a service:

    $ docker service ps restart_crash
    ID                  NAME                  IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
    30okge1sjfno        restart_crash.1       busybox:latest      bmitch-asusr556l    Shutdown            Complete 4 minutes ago
    papxoq1vve1a         \_ restart_crash.1   busybox:latest      bmitch-asusr556l    Shutdown            Complete 4 minutes ago
    1hji2oko51sk         \_ restart_crash.1   busybox:latest      bmitch-asusr556l    Shutdown            Complete 5 minutes ago
    

    And you can inspect the state for any one task:

    $ docker inspect 30okge1sjfno --format '{{json .Status}}' | jq .
    {
      "Timestamp": "2018-11-06T19:55:02.208633174Z",
      "State": "complete",
      "Message": "finished",
      "ContainerStatus": {
        "ContainerID": "8e9310bde9acc757f94a56a32c37a08efeed8a040ce98d84c851d4eef0afc545",
        "PID": 0,
        "ExitCode": 0
      },
      "PortStatus": {}
    }
    

    There's also an event history in the docker engine that you can query:

    $ docker events --filter label=com.docker.swarm.service.name=restart_crash --filter event=die --since 15m --until 0s
    2018-11-06T14:54:09.417465313-05:00 container die f17d945b249a04e716155bcc6d7db490e58e5be00973b0470b05629ce2cca461 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=1hji2oko51skhv8fv1nw71gb8, com.docker.swarm.task.name=restart_crash.1.1hji2oko51skhv8fv1nw71gb8, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.1hji2oko51skhv8fv1nw71gb8)
    2018-11-06T14:54:32.391165964-05:00 container die d6f98b8aaa171ca8a2ddaf31cce7a1e6f1436ba14696ea3842177b2e5e525f13 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=papxoq1vve1adriw6e9xqdaad, com.docker.swarm.task.name=restart_crash.1.papxoq1vve1adriw6e9xqdaad, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.papxoq1vve1adriw6e9xqdaad)
    2018-11-06T14:55:00.126450155-05:00 container die 8e9310bde9acc757f94a56a32c37a08efeed8a040ce98d84c851d4eef0afc545 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=30okge1sjfnoicd0lo2g1y0o7, com.docker.swarm.task.name=restart_crash.1.30okge1sjfnoicd0lo2g1y0o7, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.30okge1sjfnoicd0lo2g1y0o7)
    

    See more details on the events command at: https://docs.docker.com/engine/reference/commandline/events/

    The best practice at larger scale organizations is to send the container logs to a central location (e.g. Elastic) and monitor the metrics externally (e.g. Prometheus/Grafana).