Search code examples
mongodbsubdocument

In MongoDB, when to use a simple subdocument, when an array with 2-field elements?


Background

I am storing table rows as MongoDb documents, with each column having a name. Let's say table has these columns of interest: Identifier, Person, Date, Count. The MongoDb document also has some extra fields separate from the table data, represented by timestamp. Columns are not fixed (which is why I use schema-free database to store them in the first place).

There will be need to do various complex, but so far unspecified queries. I am not very concerned about performance, though query performance may conceivably become a bottleneck. Once inserted, documents will not be modifed (a new document with same Identifier will be created instead), and insertions are not very frequent (let's say, 1000 new MongoDb documents per day). So amount of data will steadily grow over time.

Example

The straight-forward approach is having a collection of MongoDb documents like:

{
   _id: XXXX,
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   data: {
       Identifier: "AB002",
       Person: "John002",
       Date: ISODate("2013-11-16T21:26:17Z"),
       Count: 1
   }
}

Now I have seen an alternative approach (for example in accepted answer of this question), using array with two fields per object:

{
   _id: XXXX,
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   data: [
       { field: "Identifier", value: "AB002" },
       { field: "Person", value: "John001" },
       { field: "Date", value: ISODate("2013-11-16T21:26:17Z") },
       { field: "Count", value: 1 }
   ]
}

Questions

Does the 2nd approach make any sense at all?

If yes, then how to choose which to use? Especially, are there some specific kinds of queries which are easy/cheap with one approach, hard/costly with another? Any "rules of thumb" on which way to go, or pro-con lists for both? Example real-life cases of one aproach being inconvenient would be especially valuable.


Solution

  • In your specific example the First version is a lot more appropriate and simple. You have to think in terms of how you would query your document.

    It is a lot simpler to query your database like this: db.collection.find({"data.Identifier": "AB002"})

    Although I'm not 100% sure why you even need the inner document. Why can't structure your document like:

    {
       _id:  "AB002",
       insertDate: ISODate("2012-10-15T21:26:17Z"),
       flag: true,
       Person: "John002",
       Date: ISODate("2013-11-16T21:26:17Z"),
       Count: 1
    }
    

    Pros of first example:

    • Simple to query
    • Enforces unique keys, but your data won't have two columns with the same name anyway
    • I would assume mongoDB would generate better query plans because the structure is a lot more simple (haven't tested)

    Pros of second example:

    • Allows multiple entries with the same key/field, but I don't feel that is useful in your case
    • A single index on the array can be used for all of its entries regardless of their field name