keepalived transitions not happening as expected

I am trying to implement keepalived based failover for my service. Please find below my configurations for the master and backup nodes.

Master node:

vrrp_script chk_splunkd {
    script "pidof splunkd"
    interval 2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    advert_int 1
    virtual_router_id 51
    priority 200
    nopreempt
    smtp_alert
    authentication {
            auth_type PASS
            auth_pass passme
    }
    virtual_ipaddress {
            10.126.246.245
    }
    track_script {
            chk_splunkd
    }
    notify_master /etc/keepalived/scripts/master.sh
    notify_backup /etc/keepalived/scripts/stop_service.sh
    notify_fault /etc/keepalived/scripts/stop_service.sh
}

Back up node:

vrrp_script chk_splunkd {
    script "pidof splunkd"
    interval 2
    fall 2
    rise 2
}
vrrp_instance VI_1 {
    interface eth0
    state BACKUP
    advert_int 1
    virtual_router_id 51
    priority 100
    nopreempt
    smtp_alert
    authentication {
            auth_type PASS
            auth_pass passme
    }

    virtual_ipaddress {
           10.126.246.245
    }
    track_script {
            chk_splunkd
    }
    notify_master /etc/keepalived/scripts/master.sh
    notify_backup /etc/keepalived/scripts/stop_service.sh
    notify_fault /etc/keepalived/scripts/stop_service.sh
}

However, I find that even when one node goes into fault state and stops sending VRRP advertisements, the other node doesn't automatically transition to master state. When I tried to monitor the VRRP advertisement packets using tcpdump -vv -i eth0 vrrp I find that even after the advertisement from one node stops, the other node doesn't automatically start sending the advertisements indicating that it has now become the master.

Please help me find out what I'm missing.

Thanks,

Keerthana

Solution

The issue was that during startup when one node became the master, the other one went into fault mode due to the pidof splunkd command which will return 1 as my splunk service should be up on only the master node. Once I edited the notify script to write current state to an external file and read the state to take action in my notify scripts, things started working fine.