Search code examples
mongodbmongodb-replica-set

Stop replica set on mongo and primary goes into recovery status


When I stop nodes of my replica set and start them up again, the primary node goes into status "recovering".

I have a replica set created, running without authorization. In order to use authorization I have added users "db.createUser(...)", and enabled authorization in the configuration file:

security:
   authorization: "enabled"

Before stopping replica set (even restarting cluster without adding security params), rs.status() shows:

{
        "set" : "REPLICASET",
        "date" : ISODate("2016-09-08T09:57:50.335Z"),
        "myState" : 1,
        "term" : NumberLong(7),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.1.167:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 301,
                        "optime" : {
                                "ts" : Timestamp(1473328390, 2),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T09:53:10Z"),
                        "electionTime" : Timestamp(1473328390, 1),
                        "electionDate" : ISODate("2016-09-08T09:53:10Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.1.168:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 295,
                        "optime" : {
                                "ts" : Timestamp(1473328390, 2),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T09:53:10Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T09:57:48.679Z"),
                        "lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.676Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "192.168.1.167:27017",
                        "configVersion" : 1
                },
                {
                        "_id" : 2,
                        "name" : "192.168.1.169:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 295,
                        "optime" : {
                                "ts" : Timestamp(1473328390, 2),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T09:53:10Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T09:57:48.680Z"),
                        "lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.054Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "192.168.1.168:27017",
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}

In order to start using this configuration, I have stopped each node as follows:

[root@n--- etc]# mongo --port 27017 --eval 'db.adminCommand("shutdown")'
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test
2016-09-02T14:26:15.784+0200 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-09-02T14:26:15.785+0200 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14

After this shutdown, I have confirmed that the process does not exist by checking the output from ps -ax | grep mongo.

But when I start the nodes again and log in with my credentials, rs.status() indicates now:

{
        "set" : "REPLICASET",
        "date" : ISODate("2016-09-08T13:19:12.963Z"),
        "myState" : 3,
        "term" : NumberLong(7),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.1.167:27017",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 42,
                        "optime" : {
                                "ts" : Timestamp(1473340490, 6),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T13:14:50Z"),
                        "infoMessage" : "could not find member to sync from",
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.1.168:27017",
                        "health" : 0,
                        "state" : 6,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T13:19:10.553Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "authenticated" : false,
                        "configVersion" : -1
                },
                {
                        "_id" : 2,
                        "name" : "192.168.1.169:27017",
                        "health" : 0,
                        "state" : 6,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T13:19:10.552Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "authenticated" : false,
                        "configVersion" : -1
                }
        ],
        "ok" : 1
}

Why? Perhaps the shutdown is not a good way to stop mongod; however I also tested using 'kill pid', but the restart ends up in the same state.

In this status I don´t know how to repair the cluster; I have started again (removing the dbpath files and reconfiguring the replica set); I tried '--repair' but has not worked.

Info about my system:

  • Mongo version: 3.2
  • I start the process as root, perhaps it should be as 'mongod' user?
  • This is my start command: mongod --conf /etc/mongod.conf
  • keyFile configuration does not work; if I add "--keyFile /path/to/file" shows:
    "about to fork child process, waiting until server is ready for connections." this file has all permissions, but it cannot use keyFile.
  • An example of the "net.bindIp" configuration, from mongod.conf on one machine:

    net:
      port: 27017
      bindIp: 127.0.0.1,192.168.1.167
    

Solution

  • Note: This solution is Windows specific but can be ported to *nix based systems easily.

    You'll need to take steps in sequence. First of all, start your mongod instances.

    start "29001" mongod --dbpath "C:\data\db\r1" --port 29001
    start "29002" mongod --dbpath "C:\data\db\r2" --port 29002
    start "29003" mongod --dbpath "C:\data\db\r3" --port 29003 
    

    Connect with mongo to each node and create an administrator user. I prefer creating super user.

    > use admin
    > db.createUser({user: "root", pwd: "123456", roles:["root"]})
    

    You may create other users as deemed necessary.

    Create key file. See documentation for valid key file contents.

    Note: On *nix based systems, set chmod of key file to 400

    In my case, I created key file as

    echo mysecret==key > C:\data\key\key.txt
    

    Now restart your MongoDB servers with --keyFile and --replSet flags enabled.

    start "29001" mongod --dbpath "C:\data\db\r1" --port 29001 --replSet "rs1" --keyFile C:\data\key\key.txt
    start "29002" mongod --dbpath "C:\data\db\r2" --port 29002 --replSet "rs1" --keyFile C:\data\key\key.txt
    start "29003" mongod --dbpath "C:\data\db\r3" --port 29003 --replSet "rs1" --keyFile C:\data\key\key.txt
    

    Once all mongod instances are up and running, connect any one with authentication.

    mongo --port 29001 -u "root" -p "123456" --authenticationDatabase "admin"
    

    Initiate replicaset,

    > use admin
    > rs.initiate()
    > rs1:PRIMARY> rs.add("localhost:29002")
    { "ok" : 1 }
    > rs1:PRIMARY> rs.add("localhost:29003")
    { "ok" : 1 }
    

    Note: You may need to replace localhost with machine name or IP address.