Environment :
I am using Pacemaker (pacemaker-1.1.19-8.el7_6.1.x86_64) and corosync(corosync-2.4.3-2.el7_5.1.x86_64) on Cent OS 7.5. Postgresql version is 9.3.21 I have Two node cluster with node names: failover1 and failover2. I have a Clone resource of Postgresql9. Below is the CIB for the same
pcs resource create Postgresql9 ocf:heartbeat:pgsql \
pgctl="/usr/pgsql-9.3/bin/pg_ctl" psql="/usr/pgsql-9.3/bin/psql" pgdata="/var/lib/pgsql/9.3/data/" start_opt="-p 5432" rep_mode="async" node_list="failover1 failover2" restore_command="" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="10.10.17.165" restart_on_promote="true" \
op monitor interval="20s" role="Slave" timeout="100s" \
op monitor interval="10s" role="Master" timeout="100s" \
op start interval="0" timeout="250s" \
op promote interval="0" timeout="70s" \
op stop interval="0" timeout="70s" \
op demote interval="0" timeout="200s" \
op notify interval="0" timeout="200s" \
meta failure-timeout="2000s" \
meta migration-threshold="3"
pcs resource master mspostgres Postgresql9 master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" migration-threshold="3" target-role="Started"
pcs constraint location mspostgres prefers failover1=90
pcs constraint location mspostgres prefers failover2=80
pcs constraint colocation add DBClusterIP with Master mspostgres score=INFINITY
pcs constraint order stop DBClusterIP then demote mspostgres kind=Optional symmetrical=false
pcs property set cluster-recheck-interval=5min
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs resource defaults migration-threshold=3
pcs property set placement-strategy=balanced
pcs property set stop-all-resources=false
pcs resource defaults failure-timeout=2000
pcs resource defaults resource-stickiness=100
pcs resource op defaults on-fail=restart
Issue : I am able to run this resource As master on failover1 and slave on failover2. When using pcs resource ban command to move master on failover2, I get the desired result. When I reboot the machine failover1, Failover2 becomes the master of resource Postgresql9, But when Failover1 machine is booted up, this resource becomes slave on both machines, and none becomes gets promoted.
Expected scenario : Ideally Booting up Failover1 should not impact the already running resource Postgresql9 Master.
Please help me understand the behavior of the cluster if the node comes back online.
It turns out that Postgres service was enabled from systemctl, hence when the server boots up systemctl start the Postgres service and because of this _monitor()
in resource Agent return $OCF_RUNNING_MASTER
and changes the status of Postgres-data-status
on other nodes.