Search code examples
dockerdocker-composedocker-network

Docker Compose Fails To Add Container to Network


I have a very simple docker-compose.yml file:

version: '3'
services:
    server:
        build : .
        ports:
            - "4567:4567"
        environment:
            - ENDPOINT_PORT=4567
    ssh:
        build:
            context: .
            dockerfile: Dockerfile-ssh-service
        ports:
          - 22
        depends_on:
          - server

Up until last week this would reliably create a network with ssh containers and a built container for my project that exposed the correct ports to each other and the correct port bindings on the host machine:

docker-compose --project-name project_name up --build -d --scale ssh=3

Names                 Ports                  Command
-----                 -----                  -------
project_name_ssh_3    0.0.0.0:32774->22/tcp  "/usr/sbin/sshd -D"
project_name_ssh_1    0.0.0.0:32775->22/tcp  "/usr/sbin/sshd -D"
project_name_ssh_2    0.0.0.0:32776->22/tcp  "/usr/sbin/sshd -D"
project_name_server_1 0.0.0.0:4567->4567/tcp "bundle exec scripts/boot.rb"

Unfortunately for about the last week, the success rate of running this command has dropped to 50% or less. For most attempts now at the command above, compose will fail to add project_name_server_1 to the compose network and fail to create its listener on the host.

Names                 Ports                 Command
-----                 -----                 -------
project_name_ssh_2    0.0.0.0:32800->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_3    0.0.0.0:32799->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_1    0.0.0.0:32798->22/tcp "/usr/sbin/sshd -D"
project_name_server_1                       "bundle exec scripts/boot.rb"

Since nothing about the compose file or the Dockerfiles that it builds have changed in the last week or between runs of that command, I'm at a loss to explain why compose will sometimes create the network correctly and not others.

This same behavior is being seen by myself using docker-compose on Windows and some of my colleagues on Mac.

Update:

If I run docker ps (dps is a customer powershell function I have that returns docker ps output as powershell objects instead of a table string) enough times during a failed run I get odd behavior where the server image runs first with a listener, then docker removes the listener and creates the other containers, adds their listeners and removes the listener from the server container, and then never bothers to re-add the listener to the server.

dps | ft names, ports, command

Names                 Ports Command
-----                 ----- -------
project_name_server_1       "bundle exec scripts/boot.rb"

dps | ft names, ports, command

Names                 Ports                  Command
-----                 -----                  -------
project_name_ssh_2                           "/usr/sbin/sshd -D"
project_name_ssh_3                           "/usr/sbin/sshd -D"
project_name_ssh_1                           "/usr/sbin/sshd -D"
project_name_server_1 0.0.0.0:4567->4567/tcp "bundle exec scripts/boot.rb"

dps | ft names, ports, command

Names                 Ports Command
-----                 ----- -------
project_name_ssh_2          "/usr/sbin/sshd -D"
project_name_ssh_3          "/usr/sbin/sshd -D"
project_name_ssh_1          "/usr/sbin/sshd -D"
project_name_server_1       "bundle exec scripts/boot.rb"

dps | ft names, ports, command

Names                 Ports                 Command
-----                 -----                 -------
project_name_ssh_2    0.0.0.0:32867->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_3    0.0.0.0:32869->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_1    0.0.0.0:32868->22/tcp "/usr/sbin/sshd -D"
project_name_server_1                       "bundle exec scripts/boot.rb"

Solution

  • The problem turned out to be an issue with the code scripts/boot.rb.

    That script would execute and then immediately crash. When it crashed, for some reason, it would get removed from the network.

    When the project started using a more reliable script to start the web server this issue went away.