node-mongodb-native does not recover on replica set primary network failure?

Our MongoDB setup uses three replica set shards. Each webserver runs a mongos instance locally, and the client node.js processes connect through that using Mongoose (3.6.20) and node-mongodb-native. So node-mongodb-native just connects to mongos on localhost.

When a replica set primary goes down hard (we can simulate this by doing 'ifdown eth0' on the primary) mongos properly detects this, and also detects that a new primary has been elected. So far, so good. But node-mongodb-native's connections to the mongos instance are still open but not functional, and a restart of the node procs is required.

Our assumption was that mongos would just kill any established connections to the dead primary and node-mongodb-native would reconnect, but that seems to not be the case; both the server and the OS think these connections are open. By contrast, on primary stepDown, the clients fail over fine, connections are closed and reopened.

We are looking at socketTimeoutMS, but that seems incorrect since it causes disconnects for connections that are merely idle.

Are we missing configuration to our client or mongos, or do we have to implement our own pinging?

Solution

Based on experimentation and the following MongoDB bug, this appears to just be a shortcoming of mongos (or, if you prefer, of the client libraries) at this point. Right now it looks like 'write your own pinging logic in your app and trigger a reconnect when that fails', so that's what we are doing.

https://jira.mongodb.org/browse/SERVER-9041