I have a three node patroni
postgres-11
cluster running.The patroni cluster settings are:
consul:
url: <consul URL>
bootstrap:
initdb:
encoding: UTF8
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
remove_data_directory_on_diverged_timelines: false
remove_data_directory_on_rewind_failure: true
parameters:
archive_command: /bin/true
archive_mode: 'on'
archive_timeout: 1800s
checkpoint_completion_target: 0.9
default_statistics_target: 100
effective_cache_size: 24GB
effective_io_concurrency: 200
hot_standby: 'on'
log_directory: /var/log/postgresql
log_filename: postgresql-%a.log
logging_collector: 'on'
maintenance_work_mem: 2GB
max_connections: 2000
max_parallel_maintenance_workers: 4
max_parallel_workers: 8
max_parallel_workers_per_gather: 4
max_replication_slots: 10
max_wal_senders: 10
max_wal_size: 8GB
max_worker_processes: 8
min_wal_size: 2GB
random_page_cost: 1.1
shared_buffers: 8GB
ssl: 'on'
ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem
ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key
track_commit_timestamp: 'on'
wal_buffers: 16MB
wal_keep_segments: 64
wal_level: logical
wal_log_hints: 'on'
work_mem: 5242kB
postgresql:
listen: 0.0.0.0:5432
connect_address: <IP Address of the node>:5432
data_dir: /var/lib/postgresql/11/<db name>
pgpass: /tmp/pgpass0
authentication:
superuser:
username: postgres
password: postgres
replication:
username: replicator
password: replicator
rewind:
username: rewind_user
password: rewind_password
pg_hba:
- local all all peer
- host all all 127.0.0.1/32 md5
- hostssl all postgres 0.0.0.0/0 md5
- host all postgres 0.0.0.0/0 reject
- host all all 0.0.0.0/0 md5
- hostssl all postgres ::0/0 md5
- host all postgres ::0/0 reject
- host all all ::0/0 md5
- local replication all peer
- host replication all 127.0.0.1/32 md5
- hostssl replication replicator 0.0.0.0/0 md5
- hostssl replication replicator ::0/0 md5
- host replication replicator 0.0.0.0/0 md5
- host replication replicator 127.0.0.1/32 trust
- local postgres postgres peer
- host all all 0.0.0.0/0 md5
create_replica_methods:
- basebackup
basebackup:
max-rate: '250M'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
restapi:
connect_address: <IP Address of the node>:8008
listen: 0.0.0.0:8008
When I shutdown the primary, I was hoping the synchronous standby
would become primary. But instead on both the standby nodes I see the following error:
INFO: following a different leader because i am not the healthiest node
Not sure why synchronous standby did not become primary.
I am using Patroni 1.6.1
on Ubuntu 18.04
I figured out the problem. For some reason the capital letters in the names of my nodes were converted to small letters so patroni failed to find the standby nodes. Like below:
{"leader":"Pa-db1","sync_standby":"pa-db2"}
The correct name of the node is Pa-db2 and similarly the second standby node name is Pa-db3. I changed all the nodes names to small letters i.e. pa-db1, pa-db2, pa-db3
and failover worked!