docker docker-compose docker-swarm docker-stack

How to spawn an interactive container in an existing docker swarm?

Note: I've tried searching for existing answers in any way I could think of, but I don't believe there's any information out there on how to achieve what I'm after

Context

I have an existing swarm running a bunch of networked services across multiple hosts. The deployment is done via docker-compose build && docker stack deploy. Several of the services contain important state necessary for the functioning of the main service this stack is for, including when interacting with it via CLI.

Goal

How can I create an ad-hoc container within the existing stack running on my swarm for interactive diagnostics and troubleshooting of my main service? The service has a CLI interface, but it needs access to the other components for that CLI to function, thus it needs to be run exactly as if it were a service declared inside docker-compose.yml. Requirements:

I need to run it in an ad-hoc fashion. This is for troubleshooting by an operator, so I don't know when exactly I'll need it
It needs to be interactive, since it's troubleshooting by a human
It needs to be able to run an arbitrary image (usually the image built for the main service and its CLI, but sometimes other diagnostics might be needed through other containers I won't know ahead of time)
It needs to have full access to the network and other resources set up for the stack, as if it were a regular predefined service in it

So far the best I've been able to do is:

Find an existing container running my service's image
SSH into the swarm host on which it's running
docker exec -ti into it to invoke the CLI

This however has a number of downsides:

I don't want to be messing with an already running container, it has an important job I don't want to accidentally interrupt, plus its state might be unrelated to what I need to do and I don't want to corrupt it
It relies on the service image also having the CLI installed. If I want to separate the two, I'm out of luck
It relies on some containers already running. If my service is entirely down and in a restart loop, I'm completely hosed because there's nowhere for me to exec in and run my CLI
I can only exec within the context of what I already have declared and running. If I need something I haven't thought to add beforehand, I'm sadly out of luck
Finding the specific host on which the container is running and going there manually is really annoying

What I really want is a version of docker run I could point to the stack and say "run in there", or docker stack run, but I haven't been able to find anything of the sort. What's the proper way of doing that?

Solution

Option 1

deploy a diagnostic service as part of the stack - a container with useful tools in it, with an entrypoint of tail -f /dev/null - use a placement contraint to deploy this to a known node.

services:
  diagnostics:
    image: nicolaka/netshoot
    command: tail -f /dev/null
  deploy:
    placement:
      constraints:
        - node.hostname == host1

NB. You do NOT have to deploy this service with your normal stack. It can be in a separate stack.yml file. You can simply stack deploy this file to your stack later, and as long as --prune is not used, the services are cumulative.

Option 2

To allow regular containers to access your services - make your network attachable. If you havn't specified the network explicitly you can just explicitly declare the default network.

networks:
  default:
    driver: overlay
    attachable: true

Now you can use docker run and attach to the network with a diagnostic container :-

docker -c manager run --rm --network <stack>_default -it nicolaka/netshoot

Option 3

The third option does not address the need to directly access the node running the service, and it does not address the need to have an instance of the service running, but it does allow you to investigate a service without effecting its state and without needing tooling in the container.

Start by executing the usual commands to discover the node and container name and id of the service task of interest:

docker service ps ${service} --no-trunc --format '{{.Node}} {{.Name}}.{{.ID}}' --filter desired-state=running

Then, assuming you have docker contexts to match your node names: - pick one ${node}, ${container} from the list of {{.Node}}, {{.Name}}.{{.ID}} and run a container such as ubuntu or netshoot, attaching it to the network namespace of the target container.

docker -c ${node} run --rm -it --network container:${container} nicolaka/netshoot

This container can be used to perform diagnostics in the context of the running service task, and then closed without affecting it.