I am running docker in a Raspberry Pi 3 Model B Plus Rev 1.3, running Raspberry pi OS with all packages up to date.
TL;DR
The healthchecks on a given container works fine for some time (around 30 min, some times less some times more), but at some point they get "stuck" and so the container remains healthy, even though it is not the case. Is there a way to debug what's going on with the healthchecks and so try to figure out what is happening?
the healthcheck is not configured in the Dockerfile, but instead in the yml file I use to deploy the stack as follows
healthcheck:
test: ["CMD-SHELL", "curl -f -s -o /dev/null https://my.domain.com/icon/none.png || exit 1"]
start_period: 1m
interval: 5s
timeout: 2s
retries: 3
When I start the container I keep checking docker inspect
and I see the different healthchecks happening every 5 seconds, as defined... but at some point, they simply stop, and I have no idea why, as can be seen below
pi@openhab:~ $ date
Thu Sep 30 01:45:46 UTC 2021
pi@openhab:~ $ docker inspect ebfa93c5e815
[
{
"Id": "ebfa93c5e815592879b6862b33a1a384cc43b60093f8df5c1a8d51ba25a7d0ef",
"Created": "2021-09-30T00:36:17.319888926Z",
"Path": "/entrypoint.sh",
"Args": [],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 3743,
"ExitCode": 0,
"Error": "",
"StartedAt": "2021-09-30T00:36:24.648900024Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "healthy",
"FailingStreak": 0,
"Log": [
{
"Start": "2021-09-30T01:05:37.394601872Z",
"End": "2021-09-30T01:05:38.510395101Z",
"ExitCode": 0,
"Output": ""
},
{
"Start": "2021-09-30T01:05:43.538165679Z",
"End": "2021-09-30T01:05:44.701265903Z",
"ExitCode": 0,
"Output": ""
},
{
"Start": "2021-09-30T01:05:49.731086207Z",
"End": "2021-09-30T01:05:50.940299522Z",
"ExitCode": 0,
"Output": ""
},
{
"Start": "2021-09-30T01:05:55.971634397Z",
"End": "2021-09-30T01:05:57.222192641Z",
"ExitCode": 0,
"Output": ""
},
{
"Start": "2021-09-30T01:06:02.251407253Z",
"End": "2021-09-30T01:06:03.402660632Z",
"ExitCode": 0,
"Output": ""
}
]
}
},
As can be seen, healthchecks are working fine up to about 30 minutes after the container is up, and then they simply stop. Current time is 40 minutes after the last healthcheck
Versions
$ docker version
Client:
Version: 18.09.1
API version: 1.39
Go version: go1.11.6
Git commit: 4c52b90
Built: Fri, 13 Sep 2019 10:45:43 +0100
OS/Arch: linux/arm
Experimental: false
Server:
Engine:
Version: 18.09.1
API version: 1.39 (minimum version 1.12)
Go version: go1.11.6
Git commit: 4c52b90
Built: Fri Sep 13 09:45:43 2019
OS/Arch: linux/arm
Experimental: false
pi@openhab:~ $ docker info
Containers: 41
Running: 6
Paused: 0
Stopped: 35
Images: 51
Server Version: 18.09.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: jze7gn1w7y5fuk9ykv9omvuwh
Is Manager: true
ClusterID: 0zmswkmc5o699wichuas93j83
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.2.104
Manager Addresses:
192.168.2.104:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 1.0.0~rc6+dfsg1-3
init version: v0.18.0 (expected: fec3683b971d9c3ef73f284f176672c44b448662)
Security Options:
seccomp
Profile: default
Kernel Version: 5.10.60-v7+
Operating System: Raspbian GNU/Linux 10 (buster)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 923.2MiB
Name: openhab
ID: IL4N:6VFR:HOFK:7DL7:KMAS:PCNQ:7KOD:2JOM:R6I2:A5GD:HO7E:4CJQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
What I am trying to do
I have an openhab installation running in the raspberry pi, which I want to be able to access remotely.
The rPi is connected to a router, which is connected to a modem and I don't have a static IP, nor I want to have a hostname dynamically updated to point at my IP, then configure port forwarding in the modem and router and so on... So instead, I do have a paid server with a static IP, and so I want to simply run SSH from the rpi to the remote server, and do a reverse port forward so I can reach openhab from the remote server. I want this ssh connection to be automatically started when the rpi is booted, and if for whatever reason I cannot reach some resource remotely (pretty much the curl test from the healthcheck) then restart the connection.
I have created a docker image with the following Dockerfile
FROM alpine:3.11
RUN apk add --no-cache \
curl \
openssh-client \
ca-certificates \
bash
COPY known_hosts /known_hosts
COPY private_key /private_key
RUN chmod 0400 /private_key
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT [ "/entrypoint.sh" ]
And the entrypoint.sh
is simply
#!/bin/bash
ssh -Nn user@my.domain.com -i /private_key -o UserKnownHostsFile=/known_hosts -R 127.0.0.1:17280:openhab:8080
Now, this works fantastic while the healthchecks are running... I can reboot the remote server, then swarm would restart the ssh-client container... I can stop openhab, then swarm restarts the ssh-client... I can disconnect the rpi from the internet, swarm restarts the ssh-client... this is all fine, and working as I expect it, until for whatever reason, healthchecks simply stop for no apparent reason, and the container remains as "healthy" forever... I still have 60% free RAM and 62% free disk space... anyone has any idea what might be happening? or has any suggestion? I cannot find logs either...
This issue appears to no longer be happening. I upgraded to Raspbian bullseye, and healthchecks have been running for a week straight, without issues.
pi@openhab:~ $ docker version
Client:
Version: 20.10.5+dfsg1
API version: 1.41
Go version: go1.15.9
Git commit: 55c4c88
Built: Sat Dec 4 10:53:03 2021
OS/Arch: linux/arm
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.5+dfsg1
API version: 1.41 (minimum version 1.12)
Go version: go1.15.9
Git commit: 363e9a8
Built: Sat Dec 4 10:53:03 2021
OS/Arch: linux/arm
Experimental: false
containerd:
Version: 1.4.13~ds1
GitCommit: 1.4.13~ds1-1~deb11u1
runc:
Version: 1.0.0~rc93+ds1
GitCommit: 1.0.0~rc93+ds1-5
docker-init:
Version: 0.19.0
GitCommit: