I'm running Docker Desktop on Windows and am having a problem with containers becoming unresponsive on startup errors. This doesn't happen 'every' time, but by far most of the time. Consequently, I have to be very careful to start my containers 1 at a time, and if I see one error, I have to "Restart Docker Desktop" and start the starting again.
I'm using docker-compose and as a specific example, this morning I started elasticsearch, zookeeper, then kafka. Kafka threw an exception regarding the zookeeper state and shuts down - but now the kafka container is unresponsive in docker. I can't stop it (it's already stopped?) but it shows as running. I can't CLI into it, I can't restart it. The only way forwards is to restart docker using the debug menu. (If I have the restart:always flag on, then the containers will actually restart automatically, but given they're throwing errors, it will just spin around in circles starting then dying without my being able to stop/kill/remove the offending container) Once I've restarted docker, I'll be able to view the log of the container and see the error that was thrown...
This happens with pretty much all of my containers, however it does appear that if I start the container whilst viewing the log window within Docker Desktop, it is perhaps 'more likely' that I'll be able to start the container again if it has an error.
I've tried several different containers and this seems to be a pretty common issue for us, it doesn't appear to relate to any specific settings that I'm passing into the containers, however an extract from our docker-compose file is below:
volumes:
zData:
kData:
eData:
zookeeper:
container_name: zookeeper
image: bitnami/zookeeper:latest
environment:
ALLOW_ANONYMOUS_LOGIN: "yes" #Dev only
ZOOKEEPER_ROOT_LOGGER: WARN, CONSOLE
ZOOKEEPER_CONSOLE_THRESHOLD: WARN
ports:
- "2181:2181"
volumes:
- zData:/bitnami/zookeeper:rw
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: zookeeper
fluentd-async-connect: "true"
kafka:
container_name: kafka
image: bitnami/kafka:latest
depends_on:
- zookeeper
environment:
ALLOW_PLAINTEXT_LISTENER: "yes" # Debug only
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_CREATE_TOPICS: xx1_event:1:1,xx2_event:1:1,xx3_event:1:1,xx4_event:1:1
KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=${DOCKER_HOSTNAME} -Dcom.sun.management.jmxremote.rmi.port=9096 -Djava.net.preferIPv4Stack=true
JMX_PORT: 9096
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
hostname: kakfa
ports:
- 9092:9092
- 9096:9096
volumes:
- kData:/bitnami/kafka:rw
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: zookeeper
fluentd-async-connect: "true"
elasticsearch:
image: bitnami/elasticsearch:latest
container_name: elasticsearch
cpu_shares: 2048
environment:
ELASTICSEARCH_HEAP_SIZE: "2048m"
xpack.monitoring.enabled: "false"
ports:
- 9200:9200
- 9300:9300
volumes:
- C:/config/elasticsearch.yml:/opt/bitnami/elasticsearch/config/my_elasticsearch.yml:rw
- eData:/bitnami/elasticsearch/data:rw
I've wondered about the potential for this to be a resourcing issue, however I'm running this on an a reasonably spec'd laptop (i7 laptop, SSD, 16GB RAM) using WSL2 (also happens when using Hyper-V) and RAM limits don't look like they're being approached. And when there are no errors on startup, the system runs fine and uses far more resources.
Any ideas on what I could try? I'm surprised there's not many more people struggling with this?
There is currently an issue https://github.com/moby/moby/issues/40063 where containers will hang/freeze/become unresponsive when logging is set to fluentd in asynchronous mode AND the fluentd container is not operational.