Search code examples
redismaster-slaveredis-sentinel

redis sentinel out of sync with servers in a cluster


We have a setup with a number of redis (2.8) servers (lets say 4) and as many redis sentinels. On startup of each machine, we set a pre-select machine as master through the command line and all the rest as slaves of that. and the sentinels all monitor these machines. The clients first connect to the local sentinel and retrieve the master's IP address and then connect there.

This setup is trouble free most of the time but sometimes the sentinels go out of sync with servers. if I name the machines A,B,C and D - sentinels will think B is master while redis servers are all connected to A as the master. bringing down redis server on B doesnt help either. I had to bring it down and manually "Sentinel failover" on A to fix the issue. Question is 1. What causes this to happen and whats the easiest and quickest way to fix this ? 2. What is best configuration - is there something better than this ?


Solution

  • The only time you should set a master is the first time. Once sentinel has taken over management of replication you should let it do it. This includes on restarts. Don't use the command line to set replication. Let sentinel and redis manage it. This is why you're getting issues - you've told sentinel it is authoritative, but you are telling the Redis servers to ignore sentinel.

    Sentinel stores the status in its Config file, so when it restarts it can resume the last configuration. So even on restart, let sentinel do it's job.

    Also, if you have 4 servers (be specific, not "let's say") you should be running a quorum of three on your monitor statement in sentinel. With a quorum of two you can wind up with two masters