Search code examples
mongodbmeteordatabase-designcollectionsnosql

Good DB-design to reference different collections in MongoDB


I'm regularly facing the similar problem on how to reference several different collections in the same property in MongoDB (or any other NoSQL database). Usually I use Meteor.js for my projects.

Let's take an example for a notes collection that includes some tagIds:

{
    _id: "XXXXXXXXXXXXXXXXXXXXXXXX",
    message: "This is an important message",
    dateTime: "2018-03-01T00:00:00.000Z",
    tagIds: [
        "123456789012345678901234",
        "abcdefabcdefabcdefabcdef"
    ]
}

So a certain id referenced in tagIds might either be a person, a product or even another note.

Of course the most obvious solutions for this imo is to save the type as well:

...
    tagIds: [
        {
            type: "note",
            id: "123456789012345678901234",
        },
        {
            type: "person",
            id: "abcdefabcdefabcdefabcdef",
        }
    ]
...

Another solution I'm also thinking about is to use several fields for each collection, but I'm not sure if this has any other benefits (apart from the clear separation):

...
    tagIdsNotes: ["123456789012345678901234"],
    tagIdsPersons: ["abcdefabcdefabcdefabcdef"],
...

But somehow both solutions feel strange to me as they need a lot of extra information (it would be nice to have this information implicit) and so I wanted to ask, if this is the way to go, or if you know any other solution for this?


Solution

  • If you use Meteor Methods to pull this data, you have a chance to run some code, get from DB, run some mappings, pull again from DB etc and return a result. However, if you use pub/sub, things are different, you need to keep it really simple and light. So, first question: method or pub/sub?

    Your question is really more like: should I embed and how much to embed, or should I not embed and build relations (only keep an id of a tag in the message object) and later use aggregations or should I denormalize (duplicate data): http://highscalability.com/building-scalable-databases-denormalization-nosql-movement-and-digg

    All these are ok in Mongo depending on your case: https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-3

    The way I do this is to keep a tags Collection indexed by messageId and eventually date (for sorting). When you have a message, you get all tags by querying the Tags Collection rather than mapping over your tags in your message object and send 3 different queries to 3 different Collections (person, product, note).

    If you embed your tags data in the message object, let's say in your UX you want to show there are 3 tags and on click you get those 3 tags. You can basically pull those tags when you pulled the message (and might not need that data) or pull the tags on an action such as click. So, you might want to consider what data you need in your view and only pull that. You could keep an Integer as number of tags on the message object and save the tags in either a tags Collection or embed in your message object.

    Following the principles of NoSQL it is ok and advisable to save some data multiple times in different collections to make your queries super fast. So in a Tags Collection you could save as well things related to your original objects. Let's say

    // Tags
    
    {
    ...
    messageId: 'xxx',
    createdAt: Date,
    person: {
     firstName: 'John',
     lastName: 'Smith',
     userId: 'yyyy',
     ...etc
    
    },
    {
    ...
    messageId: 'xxy',
    createdAt: Date,
    product: {
     name: 'product_name',
     productId: 'yyzz',
     ...etc
    
    },
    }