I have a very simple docker-compose.yml file:
version: '3'
services:
server:
build : .
ports:
- "4567:4567"
environment:
- ENDPOINT_PORT=4567
ssh:
build:
context: .
dockerfile: Dockerfile-ssh-service
ports:
- 22
depends_on:
- server
Up until last week this would reliably create a network with ssh containers and a built container for my project that exposed the correct ports to each other and the correct port bindings on the host machine:
docker-compose --project-name project_name up --build -d --scale ssh=3
Names Ports Command
----- ----- -------
project_name_ssh_3 0.0.0.0:32774->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_1 0.0.0.0:32775->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_2 0.0.0.0:32776->22/tcp "/usr/sbin/sshd -D"
project_name_server_1 0.0.0.0:4567->4567/tcp "bundle exec scripts/boot.rb"
Unfortunately for about the last week, the success rate of running this command has dropped to 50% or less. For most attempts now at the command above, compose will fail to add project_name_server_1
to the compose network and fail to create its listener on the host.
Names Ports Command
----- ----- -------
project_name_ssh_2 0.0.0.0:32800->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_3 0.0.0.0:32799->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_1 0.0.0.0:32798->22/tcp "/usr/sbin/sshd -D"
project_name_server_1 "bundle exec scripts/boot.rb"
Since nothing about the compose file or the Dockerfile
s that it builds have changed in the last week or between runs of that command, I'm at a loss to explain why compose will sometimes create the network correctly and not others.
This same behavior is being seen by myself using docker-compose on Windows and some of my colleagues on Mac.
Update:
If I run docker ps (dps is a customer powershell function I have that returns docker ps output as powershell objects instead of a table string) enough times during a failed run I get odd behavior where the server image runs first with a listener, then docker removes the listener and creates the other containers, adds their listeners and removes the listener from the server container, and then never bothers to re-add the listener to the server.
dps | ft names, ports, command
Names Ports Command
----- ----- -------
project_name_server_1 "bundle exec scripts/boot.rb"
dps | ft names, ports, command
Names Ports Command
----- ----- -------
project_name_ssh_2 "/usr/sbin/sshd -D"
project_name_ssh_3 "/usr/sbin/sshd -D"
project_name_ssh_1 "/usr/sbin/sshd -D"
project_name_server_1 0.0.0.0:4567->4567/tcp "bundle exec scripts/boot.rb"
dps | ft names, ports, command
Names Ports Command
----- ----- -------
project_name_ssh_2 "/usr/sbin/sshd -D"
project_name_ssh_3 "/usr/sbin/sshd -D"
project_name_ssh_1 "/usr/sbin/sshd -D"
project_name_server_1 "bundle exec scripts/boot.rb"
dps | ft names, ports, command
Names Ports Command
----- ----- -------
project_name_ssh_2 0.0.0.0:32867->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_3 0.0.0.0:32869->22/tcp "/usr/sbin/sshd -D"
project_name_ssh_1 0.0.0.0:32868->22/tcp "/usr/sbin/sshd -D"
project_name_server_1 "bundle exec scripts/boot.rb"
The problem turned out to be an issue with the code scripts/boot.rb
.
That script would execute and then immediately crash. When it crashed, for some reason, it would get removed from the network.
When the project started using a more reliable script to start the web server this issue went away.