amazon-ec2 redis sentinel redis-sentinel

Redis fail over with sentinel not working

I am trying to setup redis-sentinel configuration for the fail over support .Here is my configuration ,

machine1 : IP : 10.0.0.1 6379 with redis-sentinel port 26379         
machine2 : IP : 10.0.0.2 6379 with redis-sentinel port 26379     
machine3 : IP : 10.0.0.3 6379 with redis-sentinel port 26379

Redis sentinel config

machine 1 :

sentinel monitor mymaster 10.0.0.1 6379 2    
sentinel down-after-milliseconds mymaster 60000    
sentinel failover-timeout mymaster 180000    
sentinel parallel-syncs mymaster 1

machine 2 :

sentinel monitor mymaster 10.0.0.1 6379 2    
sentinel down-after-milliseconds mymaster 60000    
sentinel failover-timeout mymaster 180000    
sentinel parallel-syncs mymaster 1

machine 3:

sentinel monitor mymaster 10.0.0.1 6379 2    
sentinel down-after-milliseconds mymaster 60000    
sentinel failover-timeout mymaster 180000    
sentinel parallel-syncs mymaster 1

Added machine 2 and machine 3 as slave of machine 1 . Replication is working fine.But when machine 1 is down then master switch is not happening with other machines. They are still acting as slaves. Is there any config issues with my setup ?

Solution

Some questions before I can give a better answer:

Is there authentication running on the redis instances?
Have the sentinels actually detected the pod's topology?

If the above sentinel configurations are complete, the sentinels have not actually attached to the master. Sentinel rewrites the configuration file to store discovered topology, so what you initially configured it with would be accompanied by what it discovered. In particular we would see slave entries as well.

The other possibility is that enough sentinels to reach a quorum have not connected to the master successfully. If Redis is configured with authentication required, you need to tell the sentinels the authentication token as well using the sentinel set command.

If you can post the complete configuration, as well as the logs of the sentinels when you down the master we can provide more specific actions.

On a related note, in production I would recommend against such a setup. With the one you have you can wind up with what is known as split-brain. If the machine the master is on gets isolated from the others, but is still running, the other two will elect a new master, at which point you will have two masters. If clients are still able to connect to the maste, existing connections will stay on the original but new ones using sentinel to get the master will connect to the second master.

By running the sentinels on different machines you reduce this risk. If you have a limited number of client machines and can run sentinel there you can nearly or completely eliminate this possibility.