MongoDB - how to get fields fill-rates as quickly as possible?

We have a very big MongoDB collection of documents with some pre-defined fields that can either have a value or not.

We need to gather fill-rates of those fields, we wrote a script that goes over all documents and counts fill-rates for each, problem is it takes a long time to process all documents.

Is there a way to use db.collection.aggregate or db.collection.mapReduce to run such a script server-side? Should it have significant performance improvements? Will it slow down other usages of that collection (e.g. holding a major lock)?

Solution

Answering my own question, I was able to migrate my script using a cursor to scan the whole collection, to a map-reduce query, and running on a sample of the collection it seems it's at least twice as fast using the map-reduce.

Here's how the old script worked (in node.js):

var cursor = collection.find(query, projection).sort({_id: 1}).limit(limit);
var next = function() {
   cursor.nextObject(function(err, doc) {
      processDoc(doc, next);
   });
};
next();

and this is the new script:

collection.mapReduce(
    function () {
        var processDoc = function(doc) {
            ...
        };
        processDoc(this);
    },
    function (key, values) {
      return Array.sum(values)
    },
    {
      query : query,
      out: {inline: 1}
    },
    function (error, results) {
      // print results
    }
);

processDoc stayed basically the same, but instead of incrementing a counter on a global stats object, I do:

emit(field_name, 1);

running old and new on a sample of 100k, old took 20 seconds, new took 8.

some notes:

map-reduce's limit option doesn't work on sharded collections, I had to query for _id : { $gte, $lte} to create the sample size needed.
map-reduce's performance boost option: jsMode : true doesn't work on sharded collections as well (might have improve performance even more), it might work to run it manually on each shard to gain that feature.