I've created a Docker image with PostgreSQL
and repmgrd
, all launched with supervisor
.
My problem now is that when it's launched, the repmgrd
spawned by supervisor
seems to kind of die and another one is in its place. This leads to my inability to control it using supervisorctl
and instead having to resolve to pkill
or similar to manage it.
Dockerfile
FROM postgres:10
RUN apt-get -qq update && \
apt-get -qq install -y \
apt-transport-https \
lsb-release \
openssh-server \
postgresql-10-repmgr \
rsync \
supervisor > /dev/null && \
apt-get -qq autoremove -y && \
apt-get -qq clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# public keys configuration for passwordless login
COPY ssh/ /var/lib/postgresql/.ssh/
# postgres, sshd, supervisor and repmgr configuration
COPY etc/ /etc/
# helper scripts and entrypoint
COPY helpers/ /usr/local/bin/
ENTRYPOINT ["/usr/local/bin/pg-docker-entrypoint.sh"]
The pg-docker-entrypoint.sh
does little more than launching supervisord -c /etc/supervisor/supervisord.conf
.
supervisord.conf
[unix_http_server]
file = /var/run/supervisor.sock
chmod = 0770
chown = root:postgres
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl = unix:///var/run/supervisor.sock
[supervisord]
logfile = /var/log/supervisor/supervisor.log
childlogdir = /var/log/supervisor
pidfile = /var/run/supervisord.pid
nodaemon = true
[program:sshd]
command = /usr/sbin/sshd -D -e
stdout_logfile = /var/log/supervisor/sshd-stdout.log
stderr_logfile = /var/log/supervisor/sshd-stderr.log
[program:postgres]
command = /docker-entrypoint.sh postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
stdout_logfile = /var/log/supervisor/postgres-stdout.log
stderr_logfile = /var/log/supervisor/postgres-stderr.log
[program:repmgrd]
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
user = postgres
stdout_logfile = /var/log/supervisor/repmgr-stdout.log
stderr_logfile = /var/log/supervisor/repmgr-stderr.log
[group:jm]
programs = sshd, postgres, repmgrd
The repmgr_helper.sh
little more than /usr/lib/postgresql/10/bin/repmgrd --verbose
.
repmgr.conf
node_id=1
node_name='pg-dock-1'
conninfo='host=pg-dock-1 port=5432 user=repmgr dbname=repmgr connect_timeout=60'
data_directory='/var/lib/postgresql/data/'
use_replication_slots=1
pg_bindir='/usr/lib/postgresql/10/bin/'
failover='automatic'
promote_command='/usr/bin/repmgr standby promote --log-to-file'
follow_command='/usr/bin/repmgr standby follow --log-to-file -W --upstream-node-id=%n'
ps
output
root@9f39cb085506:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:54 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 10 1 0 11:54 ? 00:00:01 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 13 10 0 11:54 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 15 10 0 11:54 ? 00:00:07 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 36 15 0 11:54 ? 00:00:00 postgres: checkpointer process
postgres 37 15 0 11:54 ? 00:00:00 postgres: writer process
postgres 38 15 0 11:54 ? 00:00:00 postgres: wal writer process
postgres 39 15 0 11:54 ? 00:00:00 postgres: autovacuum launcher process
postgres 40 15 0 11:54 ? 00:00:00 postgres: archiver process
postgres 41 15 0 11:54 ? 00:00:01 postgres: stats collector process
postgres 42 15 0 11:54 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 51 15 0 11:54 ? 00:00:00 postgres: wal sender process repmgr 10.0.14.4(33812) streaming 0/4002110
postgres 55 15 0 11:54 ? 00:00:00 postgres: repmgr repmgr 10.0.14.4(33824) idle
postgres 88 15 0 11:54 ? 00:00:01 postgres: repmgr repmgr 10.0.14.5(33496) idle
postgres 90 1 0 11:54 ? 00:00:03 /usr/lib/postgresql/10/bin/repmgrd --verbose
root 107 0 0 11:54 pts/0 00:00:00 bash
root 9323 107 0 12:50 pts/0 00:00:00 ps -ef
As you can see, the repmgrd
process is now child of the entrypoint instead of supervisor
(like sshd
and postgres
). I've tried to launch the command directly (no "helper"), I've tried using bash -c
, I've tried specifying /usr/bin/repmgrd
as executable, but no matter what I try in the end I always come to this result.
My question is then two-fold: why does this happen and what can I do to keep the repmgrd
process under supervisor
's control.
Edit: As suggested I tried with --daemonize=false
when starting repmgrd.
This kind of helps, but not completely. See the output:
root@6ab09e13f425:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:06 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 11 1 2 17:06 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 14 11 0 17:06 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 15 11 0 17:06 ? 00:00:00 bash /usr/local/bin/repmgr_helper.sh
postgres 16 11 1 17:06 ? 00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 37 16 0 17:06 ? 00:00:00 postgres: checkpointer process
postgres 38 16 0 17:06 ? 00:00:00 postgres: writer process
postgres 39 16 0 17:06 ? 00:00:00 postgres: wal writer process
postgres 40 16 0 17:06 ? 00:00:00 postgres: autovacuum launcher process
postgres 41 16 0 17:06 ? 00:00:00 postgres: archiver process
postgres 42 16 0 17:06 ? 00:00:00 postgres: stats collector process
postgres 43 16 0 17:06 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 44 16 0 17:06 ? 00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E28
root 45 0 0 17:06 pts/0 00:00:00 bash
postgres 77 15 1 17:06 ? 00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres 78 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres 79 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root 86 45 0 17:06 pts/0 00:00:00 ps -ef
root@6ab09e13f425:/# supervisorctl stop jm:repmgrd
jm:repmgrd: stopped
root@6ab09e13f425:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:06 ? 00:00:00 bash /usr/local/bin/pg-docker-entrypoint.sh
root 11 1 1 17:06 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root 14 11 0 17:06 ? 00:00:00 /usr/sbin/sshd -D -e
postgres 16 11 0 17:06 ? 00:00:00 postgres -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 37 16 0 17:06 ? 00:00:00 postgres: checkpointer process
postgres 38 16 0 17:06 ? 00:00:00 postgres: writer process
postgres 39 16 0 17:06 ? 00:00:00 postgres: wal writer process
postgres 40 16 0 17:06 ? 00:00:00 postgres: autovacuum launcher process
postgres 41 16 0 17:06 ? 00:00:00 postgres: archiver process
postgres 42 16 0 17:06 ? 00:00:00 postgres: stats collector process
postgres 43 16 0 17:06 ? 00:00:00 postgres: bgworker: logical replication launcher
postgres 44 16 0 17:06 ? 00:00:00 postgres: wal sender process repmgr 10.0.23.136(47132) streaming 0/4008E60
root 45 0 0 17:06 pts/0 00:00:00 bash
postgres 77 1 0 17:06 ? 00:00:00 /usr/lib/postgresql/10/bin/repmgrd --daemonize=false --verbose
postgres 78 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.136(47150) idle
postgres 79 16 0 17:06 ? 00:00:00 postgres: repmgr repmgr 10.0.23.134(43476) idle
root 106 45 0 17:07 pts/0 00:00:00 ps -ef
At startup the process remains with supervisor
, but stopping it will only kill the repmgr_helper.sh
leading to the "real" process to remain alive and be reassigned to 1
as its parent.
This isn't ideal because now I have a weird situation where the process is alive, but supervisor
thinks it's not. Hence issuing a supervisorctl start jm:repmgrd
wil fail saying
[ERROR] PID file "/tmp/repmgrd.pid" exists and seems to contain a valid PID
[HINT] if repmgrd is no longer alive, remove the file and restart repmgrd
Updated answer based on the discussion in the comments:
These are the issues with the current solution:
The original command to start repmgrd:
command = bash -c "sleep 10 && /usr/local/bin/repmgr_helper.sh"
runs bash, which executes another bash script (that is another instance of bash), which then runs repmgrd, these are too many processes, most of them not needed
supervisord wants that the invoked command remains in the foreground, but repmgrd by default daemonizes itself
While troubleshooting there were some issues with the pid file generated by repmgrd
These can be fixed by the following changes:
The command to be used instead:
command = /usr/local/bin/repmgr_helper.sh
/usr/local/bin/repmgr_helper.sh
needs to be updated to run sleep 10
as the first step
/usr/local/bin/repmgr_helper.sh
as the very last step should invoke repmgrd the following way:
exec /path/to/repmgrd --daemonize=false --no-pid-file
so, a. due to the exec
it replaces the script it starts it b. it doesn't daemonize itself c. it doesn't generate a pid file.
Original answer (before the updates)
In the start command try passing --daemonize=false
to repmgrd.