So in our configuration we have 6 total Mongod servers. 3 of them we've been running for awhile (version 3.0.6), and we recently added 3 new ones from our new datacenter(v3.2.6), as we figured this would be a good way to migrate usage from the old to the new. These newer boxes have been in rotation for several weeks, and would be up to date. None of the boxes has a slave delay above 0.
One of the original servers had a priority of 2, while the other 5 had a priority of 1. Otherwise, the config for each server was identical. We have no additional servers for voting purposes, etc.
Our applications use all 6 addresses, so they will failover on their own.
So today, thinking we had thought of everything, we stopped the mongod process on the 3 original servers.
And none of the new boxes would become primary. They all remained in secondary. So we turned back on the old primary, and it took primary again immediately. So, thinking the issue might be something with the priority, we reduced the old primary to 1 and put one of the new boxes to 2. Saved the config, and the new box become the primary.
Thinking we licked it, we once again shut down the old primary.
And the new box immediately stepped back down to secondary, leaving no primary.
So we started back up the old primary, and the new box was immediately made primary again.
So, we currently set the priority of the old box to 0, and left it running.
But we can't stay running that way. Why wasn't one of the new machines automatically promoted to primary? Why would it step down if we removed an older box?
Easy to answer: 6 - 3 = 3, which is smaller than 4, which you would have needed to built a quorum. With only 3 servers up, the remaining running members can not build a quorum (counting the numbers of the servers as defined by the replica set configuration). They revert to secondary state, since a primary can not reliably be determined – it could just be a network partitioning going on. Allowing elections to succeed with less than a quorum would make the dreaded split-brain situation possible.
This has nothing to do with priority, btw. And you should not fiddle with it unless you exactly know what you are doing.
Solving your problem should be easy enough, though:
Important: Make sure all "new" members are either in secondary or primary state before proceeding
rs.remove()
to remove the stopped old members from the replica set configurationrs.stepDown()
to make one of your new servers primary and connect to itrs.remove()
again to remove the last "old" member from the replica setWith an odd number of voting members remaining in your replica set, they are able to build a quorum and elect a new primary.