An author has multiple articles. An article has always exactly one author.
How would I design this in mongodb?
First I thought I could just embed 'article' as a subdocument into 'authors'. But since I need to grab all articles of all authors quite frequently, I thought this is maybe not the nicest solution.
Now I am thinking it might be better to have two separate collections: 'authors' and 'articles'. An author would have multiple articles. So the author schema would like this:
const authorSchema = new Schema({
email: { type: String, unique: true, lowercase: true},
password: String,
fname: String,
lname: String,
articles: [{
type: Schema.Types.ObjectId,
ref: "article"
}]
})
Each article has exactly one author. So it would look like:
const articleSchema = new Schema({
title: { type:String, unique: true },
sentences: Array,
image: String,
tags: Array,
word_count: Number,
author: {
type: Schema.Types.ObjectId,
ref: "author"
}
})
The question is: is this valid? Will I run into problems with this sort of setup?
I am worried that saving multiple articles in author
as well as saving author
in each article
is a redundancy to be avoided or at the least not considered very good practice.
The first thought: You are right this is not how a schema should be designed in many cases. Although there is already a tradeoff here as to when to separate something into a new collection and it often depends on the access patterns as you have already pointed out and the nature of the relationship the collections are in.
For you case I consider separate collections for article and author as the most reasonable choice.
For your current problem you thought about the most versatile/general method to implement it. Consider the case when you leave out the author reference in the article schema: If you now want to now the authors of a specific article you waste a lot of resources on searching through all the authors and checking whether they are authors of that article instead of just following the references in the article.
Another point is that storing some references does not take up much memory so this should not be a break up point also considering the cost of memory today...
Closest to a problem: Remember however that your program accessing the database needs to maintain both sides of the relationship in a database like mongodb which makes the program a little more complex.
So as a conclusion, you should not be worried that storing the references in both collections is bad practice it is even common to do so to speed up certain access patterns.