Suppose I have a MongoDB where I have separate texts
which consist of statements
.
I need to be able to search for the texts
, which have certain keywords in statements
(also multiple texts that have an occurrence of a search term).
I also need to be able to find all the statements
in all the texts added by a particular user, which contain a particular search phrase.
My question: do I need to create a separate collection for statements
or can I simply add them as nested into the texts
collection?
So, option 1 (separate collections):
Texts collection
text: {
name: 'nabokov',
id: '1'
}
Statements collection:
statement: {
text_id: '1',
id: '24',
text: 'He opened the window and saw the sky`
}
Option 2 (nested):
text: {
name: 'nabokov',
id: '1'
statements: [
id: '24',
text: 'He opened the window and saw the sky`
]
}
Which MongoDB storage schema is better if I want to retrieve statements separately based on keyword search and retain the contextual data (e.g. which text they belong to etc.)
How would this affect the write / read speed for larger DBs (e.g. > 100 Gb).
My texts would be limited to 16 Mb.
For MongoDB document schema design w.r.t. performance, there are several factors that could be helpful to take into consideration:
For your scenario, we actually need more context / details from you to work out a more sensible "answer". But here are some common scenarios that I have personally come into before and it might be useful for you as a reference.
text
as a root document that is not frequently updated; Most of the queries are based on the statement
collection as a child collection.In this case, it could be a good idea to denormalize the text
document and replicating the field name
into corresponding statement
document.
e.g.
statement: {
text_id: '1',
text_name: 'nabokov',
id: '24',
text: 'He opened the window and saw the sky`
}
In this way, you gain performance boost by avoiding a $lookup
to the text
collection, while only incurring a small cost of maintaining the new text_name
column. The cost is small since the text
document is not going to be updated frequently anyway.
text
document will be associated with small number of statements
objects/documents only.In this case, it could be a good idea to go for your option 1 (i.e. keep the statements in an array of text
document). The advantage is you can compose rather simple queries and avoid the cost in maintaining another collection of statement
.
Here is a very good document to read more about MongoDB schema design.