Search code examples
javascriptnode.jsmongodbmongoosemongoose-schema

MongoDB embedded documents: size limit and aggregation performance concerns


In MongoDB's documentation it is suggested to put as much data as possible in a single document. It is also suggested NOT to use ObjectId ref based sub-documents unless the data of those sub-documents must be referenced from more than one document.

In my case I have a one-to-many relationship like this:

Log schema:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

Machine schema:

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        logs: [ mongoose.model("Log").schema ]
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

Each Machine will have many Production_Log documents (more than one million). Using embedded documents I hitted the 16mb per document limit very quickly during my tests and I couldn't add any more Production_Log documents to the Machine documents.

My doubts

  1. Is this a case where one should be using sub-documents as ObjectId references instead of embedded documents?

  2. Is there any other solution I could evaluate?

  3. I will be accessing Production_Log documents to generate stats for each Machine using the aggregation framework. Should I have any extra consideration on the schema design?

Thank you very much in advance for your advice!


Solution

  • Database normalization is not applicable to MongoDB

    MongoDB scales better if you store full information in the single document (Data redundancy). Database normalization obligate split data in different collections, but once growth your data, it will cause bottlenecks issues.

    Use only LOG Schema:

    const model = (mongoose) => {
        const LogSchema = new mongoose.Schema({
            model: { type: String, required: true },
            description: { type: String, required: true },
            result: { type: String, required: true },
            operation: { type: Date, required: true },
            x: { type: Number, required: true },
            y: { type: Number, required: true },
            z: { type: Number, required: true }
        });
        const model = mongoose.model("Log", LogSchema);
        return model;
    };
    

    Read / Write operation scales fine in this way.

    With Aggregation you can process data and compute desired result.