Search code examples
postgresqlpgpool

PGpool Promote to master and Slaves wont replicate on failure


(version pgpool-II-pg93-3.4.0-3pgdg.rhel6.x86_64.rpm)

I’ve seen some chat online about this from a while back, but I haven’t found a solve.

I have 2 pgpool servers running streaming replication and load balancing. They connect to 3 postgres servers. 2 are virtual machines and 1 Physical machine.

I’d prefer the physical machine to be the master server. I have done test using the pcp_promote_node command, and I see that the show pool_nodes; shows the node I select as primary, however on postgres side the command SELECT pg_is_in_recovery(); show that server still in recovery.

All the pcp_promote command does, is detach master node. Which then prevents insert into the DB.

Secondly If the master fails, another server does become master, however the other slave is clueless of this and does not replicate any new data until I recover it from the new master.

Is there a work around for the above 2 problems. Also please let me know if u need more info, such as logs etc etc.

Thanks


Solution

  • Ok so. By design, pgpool does the above.

    The follow_master_command in the pgpool.conf file is what you need to use to resolve this issue.

    I use the following script (follow_master.sh which i place in the /etc/pgpool-II folder)

    #!/bin/sh
    
    ################
    ##
    ## $1 = node id
    ## $2 = Old master node id
    ## $3 = node hostname
    ##
    ############### 
    PGPOOLIP=10.**.**.**    
    PGUSER=postgres
    PGPASS=*************
    PGHOME=/var/lib/pgsql/9.3
    REMOTE_PGDATA=/var/lib/pgsql/9.3/data
    
    if [ $1 = $2 ]; then
            /usr/bin/pcp_detach_node 10 $PGPOOLIP 9898 $PGUSER $PGPASS $1
    else
            sleep 5
            ssh -T postgres@$3 "
            LD_LIBRARY_PATH=$PGHOME/lib:LD_LIBRARH_PATH;
            $PGHOME/bin/pg_ctl -w -D $REMOTE_PGDATA stop"
            /usr/bin/pcp_detach_node 10 $PGPOOLIP 9898 $PGUSER $PGPASS $1
            /usr/bin/pcp_recovery_node 10 $PGPOOLIP 9898 $PGUSER $PGPASS $1
            /usr/bin/pcp_attach_node 10 $PGPOOLIP 9898 $PGUSER $PGPASS $1
    fi
    

    I've done testing and it seems to be working fine.

    I hope this helps someone in the future.

    Thanks

    Rob