I am trying to implement keepalived based failover for my service. Please find below my configurations for the master and backup nodes.
Master node:
vrrp_script chk_splunkd {
script "pidof splunkd"
interval 2
fall 2
rise 2
}
vrrp_instance VI_1 {
interface eth0
state MASTER
advert_int 1
virtual_router_id 51
priority 200
nopreempt
smtp_alert
authentication {
auth_type PASS
auth_pass passme
}
virtual_ipaddress {
10.126.246.245
}
track_script {
chk_splunkd
}
notify_master /etc/keepalived/scripts/master.sh
notify_backup /etc/keepalived/scripts/stop_service.sh
notify_fault /etc/keepalived/scripts/stop_service.sh
}
Back up node:
vrrp_script chk_splunkd {
script "pidof splunkd"
interval 2
fall 2
rise 2
}
vrrp_instance VI_1 {
interface eth0
state BACKUP
advert_int 1
virtual_router_id 51
priority 100
nopreempt
smtp_alert
authentication {
auth_type PASS
auth_pass passme
}
virtual_ipaddress {
10.126.246.245
}
track_script {
chk_splunkd
}
notify_master /etc/keepalived/scripts/master.sh
notify_backup /etc/keepalived/scripts/stop_service.sh
notify_fault /etc/keepalived/scripts/stop_service.sh
}
However, I find that even when one node goes into fault state and stops sending VRRP advertisements, the other node doesn't automatically transition to master state. When I tried to monitor the VRRP advertisement packets using tcpdump -vv -i eth0 vrrp
I find that even after the advertisement from one node stops, the other node doesn't automatically start sending the advertisements indicating that it has now become the master.
Please help me find out what I'm missing.
Thanks,
Keerthana
The issue was that during startup when one node became the master, the other one went into fault mode due to the pidof splunkd
command which will return 1 as my splunk service should be up on only the master node. Once I edited the notify script to write current state to an external file and read the state to take action in my notify scripts, things started working fine.