Search code examples
mongodbsharding

MongoS sharding metadata manager failed asking for instance is manually reset


My MongoS servers are not staring they are sending this error in logs.

SHARDING [Balancer] caught exception while doing balance: Server's sharding metadata manager failed to initialize and will remain in this state until the instance is manually reset :: caused by :: HostNotFound: unable to resolve DNS for host confserv_1.xyz.com

2016-05-02T17:57:06.612+0530 I SHARDING [Balancer] about to log metadata event into actionlog: { _id: "DB2255-2016-05-02T17:57:06.611+0530-5727479aa1051c5fb04fcc49", server: "mongoS1", clientAddr: "", time: new Date(1462192026611), what: "balancer.round", ns: "", details: { executionTimeMillis: 35, errorOccured: true, errmsg: "Server's sharding metadata manager failed to initialize and will remain in this state until the instance is manually reset :: caused by :: HostNotFoun..." } }  

When I connect config server using host name it is working fine.
I tried to restart MongoS server it is not coming up.

I check Mongo code and found this error mentioned in
https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/sharding_state.cpp

/ TODO: remove after v3.4.
// This is for backwards compatibility with old style initialization through metadata
// commands/setShardVersion. As well as all assignments to _initializationStatus and
// _setInitializationState_inlock in this method.
if (_getInitializationState() == InitializationState::kInitializing) {
    auto waitStatus = _waitForInitialization_inlock(deadline, lk);
    if (!waitStatus.isOK()) {
        return waitStatus;
    }
}

if (_getInitializationState() == InitializationState::kError) {
    return {ErrorCodes::ManualInterventionRequired,
            str::stream() << "Server's sharding metadata manager failed to initialize and will "
                             "remain in this state until the instance is manually reset"
                          << causedBy(_initializationStatus)};
}  

But it does not mention anything what manual intervention is required. Current Mongo version is 3.2.6


Solution

  • I just ran into this problem while trying to harden the security configuration. As in your case, I was able to connect to the config servers from all mongos instances.

    In my case I was also testing a case with members of replica sets being in different datacenters, and I had the problem only after steppingDown some primaries.

    I noticed at the end that, not as the error message is pretending, the issue was happening on some primaries of one datacenter, who were not able to route back to the config server. After fixing the routing problem (/etc/hosts eventually), no more problems occurred on the mongo side.