Search code examples
mongodbdynamicsubdocument

dynamic size of subdocument mongodb


I'm using mongodb and mongoose for my web application. The web app is used for registration for swimming competitions and each competition can have X number of races. My data structure as of now:

{
  "_id": "1",
  "name": "Utmanaren",
  "location": "town",
  "startdate": "20150627",
  "enddate": "20150627"
  "race" : {
    "gender" : "m"
    "style" : "freestyle"
    "length" : "100"
  }
}

Doing this i need to determine and define the number of races for every competition. A solution i tried is having a separate document and having a Id for which competition a races belongs to, like below.

{
  "belongsTOId" : "1"
  "gender" : "m"
  "style" : "freestyle"
  "length" : "100"
}
{
  "belongsTOId" : "1"
  "gender" : "f"
  "style" : "butterfly"
  "length" : "50"
}

Is there a way of creating and defining dynamic number of races as a subdocument while using Mongodb?

Thanks!


Solution

  • You have basically two approaches of modelling your data structure; you can either design a schema where you can reference or embed the races document.

    Let's consider the following example that maps swimming competition and multiple races relationships. This demonstrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between competition and race data, the competition has multiple races entities:

    // db.competition schema
    {
        "_id": 1,
        "name": "Utmanaren",
        "location": "town",
        "startdate": "20150627",
        "enddate": "20150627"
        "races": [
            {
                "gender" : "m"
                "style" : "freestyle"
                "length" : "100"
            },
            {           
                "gender" : "f"
                "style" : "butterfly"
                "length" : "50"
            }
        ]
    }
    

    With the embedded data model, your application can retrieve the complete swimming competition information with just one query. This design has other merits as well, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a competition which has a race "style" property with value "butterfly", this can be done with one single (atomic) operation:

    db.competition.remove({"races.style": "butterfly"});
    

    For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents

    The other design option is referencing documents follow a normalized schema where the race documents contain a reference to the competition document:

    // db.race schema
    {
        "_id": 1,
        "competition_id": 1,
        "gender": "m",
        "style": "freestyle",
        "length": "100"
    },
    {
        "_id": 2,
        "competition_id": 1,
        "gender": "f",
        "style": "butterfly",
        "length": "50"
    }
    

    The above approach gives increased flexibility in performing queries. For instance, to retrieve all child race documents where the main parent entity competition has id 1 will be straightforward, simply create a query against the collection race:

    db.race.find({"competiton_id": 1});
    

    The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of race documents per given competition, the embedding option has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.

    If your application frequently retrieves the race data with the competition information, then your application needs to issue multiple queries to resolve the references.

    The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.

    Ref:

    MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland