Search code examples
postgresqldockerdocker-compose

Replica refuses to accept connections, albeit that primary works just fine


I am currently running a one replica and one primary container within Docker Compose, and all my services in Compose is connected to the same network.

Bouncer for primary can connect to primary, and backend application can connect to bouncer:

❯ psql -h db-bouncer -p 6432 -U $POSTGRES_USER $POSTGRES_DB
Password for user dev: 
psql (16.4)
Type "help" for help.
foo=# exit
>

Bouncer for replica cannot connect to the replica:

pgbouncer 19:25:57.97 INFO  ==> Waiting for PostgreSQL backend to be accessible
timeout reached before the port went into state "inuse"
timeout reached before the port went into state "inuse"
...
pgbouncer 19:32:38.13 ERROR ==> Backend db-replica not accessible

neither backend application can:

❯ psql -h db-replica -p 5433 -U $POSTGRES_USER $POSTGRES_DB
psql: error: connection to server at "db-replica" (172.17.0.4), port 5433 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

and the most confusing thing is, there is literally no signs of error here. config is the literal same with primary.

replica:

2024-09-24 19:26:04.317 GMT [32] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-09-24 19:26:04.317 GMT [32] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-09-24 19:26:04.317 GMT [32] LOG:  listening on IPv6 address "::", port 5432
2024-09-24 19:26:04.319 GMT [32] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-09-24 19:26:04.321 GMT [56] LOG:  database system was interrupted; last known up at 2024-09-24 19:26:03 GMT
2024-09-24 19:26:05.076 GMT [56] LOG:  entering standby mode
2024-09-24 19:26:05.076 GMT [56] LOG:  starting backup recovery with redo LSN 0/2000028, checkpoint LSN 0/2000060, on timeline ID 1
2024-09-24 19:26:05.078 GMT [56] LOG:  redo starts at 0/2000028
2024-09-24 19:26:05.078 GMT [56] LOG:  completed backup recovery with redo LSN 0/2000028 and end LSN 0/2000100
2024-09-24 19:26:05.078 GMT [56] LOG:  consistent recovery state reached at 0/2000100
2024-09-24 19:26:05.078 GMT [32] LOG:  database system is ready to accept read-only connections

is working fine with port 5432. I have set the Compose port to 5433:5432. postgresql.conf is same. pg_hba.conf is same. port configuration is same except binding to 5433 instead of 5432. initialization scripts are same. replica is literally 1:1 clone of primary Compose service. i literally lost my ability to think rationally, this does not make a flying sense.

    db-replica:
        build:
          context: .
          dockerfile: db.replica.Dockerfile
        env_file:
          - '.env'
        ports:
            - ${POSTGRES_REPLICA_PORT:-5433}:5432
        volumes:
            - ./pg_hba.conf:/etc/postgresql/pg_hba.conf
            - ./postgresql.conf:/etc/postgresql/postgresql.conf
            - postgres_replica_data:/var/lib/postgresql/data/docker
        command: |
            bash -c "
                if [ -z \"$(ls -A /var/lib/postgresql/data/docker)\" ]; then
                  until pg_basebackup -h $POSTGRES_PRIMARY_HOST -U $POSTGRES_USER -p $POSTGRES_PRIMARY_PORT -D /var/lib/postgresql/data/docker -Fp -Xs -P -R -S slave
                  do
                    echo 'Waiting for primary to connect...'
                    sleep 1s
                  done
                  echo 'Backup done, starting replica...'
                  chmod 0700 /var/lib/postgresql/data
                  chmod 
                  bash /etc/postgresql/init-conf.sh
                else
                  echo 'Backup already exists, starting replica...'
                  bash /etc/postgresql/init-conf.sh
                fi
              "

init-conf.sh does nothing, it just moves the config around and starts the server

#!/bin/bash
set -e
[[ -e /etc/postgresql/pg_hba.conf ]] && cp /etc/postgresql/pg_hba.conf /var/lib/postgresql/data/docker/pg_hba.conf
[[ -e /etc/postgresql/postgresql.conf ]] && cp /etc/postgresql/postgresql.conf /var/lib/postgresql/data/docker/postgresql.conf
bash docker-entrypoint.sh -c 'max_connections=200'

I am certain that docker-entrypoint.sh does not modify anything because I literally can see the binding log 0.0.0.0:5432 from replica.

2024-09-24 19:26:04.317 GMT [32] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-09-24 19:26:04.317 GMT [32] LOG:  listening on IPv6 address "::", port 5432

Dockerfile also does nothing,it just sets the data folder and and moves the init-conf.sh to the appropiate place.

FROM postgres:16-bookworm

RUN apt update
RUN apt install -y -V ca-certificates lsb-release wget
# omitted pgoonga installation codes that does not interfere with networking
RUN mkdir -p /docker-entrypoint-initdb.d
RUN mkdir -p /var/lib/postgresql/data/docker
COPY init-conf.sh /etc/postgresql/init-conf.sh
RUN chmod +x /etc/postgresql/init-conf.sh
COPY docker-entrypoint-initdb.d /docker-entrypoint-initdb.d

and primary? IT. IS. SAME.

    db:
        build:
          context: .
          # dockerfile is same as replica, only that init-conf is ran as initialization script now and docker-entrypoint is removed BUT IT CHANGES NOTHING RELATED TO NETWORKING
          dockerfile: db.Dockerfile
        env_file:
          - '.env'
        ports:
            - ${POSTGRES_PRIMARY_PORT:-5432}:5432
        volumes:
            - ./pg_hba.conf:/etc/postgresql/pg_hba.conf
            - ./postgresql.conf:/etc/postgresql/postgresql.conf
            - postgres_primary_data:/var/lib/postgresql/data/docker
        command: |
            postgres 
            -c wal_level=replica 
            -c hot_standby=on 
            -c max_wal_senders=10 
            -c max_replication_slots=10 
            -c hot_standby_feedback=on

it starts just the same

2024-09-24 19:26:02.264 GMT [1] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-09-24 19:26:02.264 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-09-24 19:26:02.264 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2024-09-24 19:26:02.266 GMT [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-09-24 19:26:02.268 GMT [73] LOG:  database system was shut down at 2024-09-24 19:26:02 GMT
2024-09-24 19:26:02.270 GMT [1] LOG:  database system is ready to accept connections

bouncer can connect to the primary just fine.

gbouncer 19:25:57.96 INFO  ==> Validating settings in PGBOUNCER_* env vars...
pgbouncer 19:25:57.96 INFO  ==> Initializing PgBouncer...
pgbouncer 19:25:57.97 INFO  ==> Waiting for PostgreSQL backend to be accessible
pgbouncer 19:26:02.49 INFO  ==> Backend db:5432 accessible
pgbouncer 19:26:02.49 INFO  ==> Configuring credentials
pgbouncer 19:26:02.50 INFO  ==> Creating configuration file
pgbouncer 19:26:02.60 INFO  ==> Loading custom scripts...
pgbouncer 19:26:02.61 INFO  ==> ** PgBouncer setup finished! **

Solution

  • You are trying to connect to port 5433 (which you have wired up on the host) but directly on the container.

    "db-replica" (172.17.0.4), port 5433 failed

    Configure port 5432 (where postgresql is listening in the container) instead.