Search code examples
mongodbmongoosenosql-aggregation

Calculating collection stats for a subset of documents in MongoDB


I know the cardinal rule of SE is to not ask a question without giving examples of what you've already tried, but in this case I can't find where to begin. I've looked at the documentation for MongoDB and it looks like there are only two ways to calculate storage usage:

  1. db.collection.stats() returns the statistics about the entire collection. In my case I need to know the amount of storage being used to by a subset of data within a collection (data for a particular user).
  2. Object.bsonsize(<document>) returns the storage size of a single record, which would require a cursor function to calculate the size of each document, one at a time. My only concern with this approach is performance with large amounts of data. If a single user has tens of thousands of documents this process could take too long.

Does anyone know of a way to calculate the aggregate document size of set of records within a collection efficiently and accurately.

Thanks for the help.


Solution

  • This may not be the most efficient or accurate way to do it, but I ended up using a Mongoose plugin to get the size of the JSON representation of the document before it's saved:

    module.exports = exports = function defaultPlugin(schema, options){
        schema.add({
            userId: { type: mongoose.Schema.Types.ObjectId, ref: "User", required: true },
            recordSize: Number
        });
    
        schema.pre('save', function(next) {
            this.recordSize = JSON.stringify(this).length;
            next();
        });
    }
    

    This will convert the schema object to a JSON representation, get it's length, then store the size in the document itself. I understand that this will actually add a tiny bit of extra storage to record the size, but it's the best I could come up with.

    Then, to generate a storage report, I'm using a simple aggregate call to get the sum of all of the recordSize values in the collection, filtered by userId:

    mongoose.model('YouCollectionName').aggregate([
    {
        $match: { 
            userId: userId
        }
    },
    { 
        $group: {
            _id: null,
            recordSize: { $sum: '$recordSize'},
            recordCount: { $sum: 1 }
        }
    }
    ], function (err, results) {
       //Do something with your results
    });