I still can't understand how indexing a key of an embedded document really works.
Suppose I have the following collection of blog posts:
{
_id:0,
author: 'John Doe',
content: 'How indexing an embedded document work?',
comments:
[
{
sender:'Jane Doe',
content: 'I can\'t make it out either.'
},
etc...
]
},
etc...
Suppose I now set an index upon the sender property in the comments:
db.blog.createIndex({'comments.sender':1})
Now the question:
Does this mean that a big index is created for all elements ordered by sender in ascending order no matter in which array they are? Or an index is created for each array?
To make it clearer: when I do
blog.find({'comments.sender':'Jane Doe'}).toArray(function(err, array){})
Will it go through each blog post and look up each array till a record is found in that array and move to the next array in the next post? Or there is a big index, in which each record (ordered by sender) maps to the original array in which this match resides?
This creates an index with one entry per comment. If you have 2 blog posts with 3 comments each, you will have an index with 6 entries, indexed by the comment sender name. Your second hypothesis is the right one: your search will find all comments with the right sender efficiently using this index and then return the corresponding blog posts.
So short answer: yes, do this, it works and you will get optimal query time.
You can easily check your queries complexity with explain:
> db.blog.insert({'comments': [{'sender': 'Jane'}]})
WriteResult({ "nInserted" : 1 })
> db.blog.insert({'comments': [{'sender': 'Jane'}, {'sender': 'Joe'}]})
WriteResult({ "nInserted" : 1 })
> db.blog.insert({'comments': [{'sender': 'Joe'}]})
WriteResult({ "nInserted" : 1 })
> db.blog.ensureIndex({'comments.sender': 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.blog.find({'comments.sender': 'Jane'}).count()
2
> db.blog.find({'comments.sender': 'Jane'}).explain()
{
"cursor" : "BtreeCursor comments.sender_1",
"isMultiKey" : true,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"comments.sender" : [
[
"Jane",
"Jane"
]
]
},
"server" : "metrics.9.0.api.production.infinit.io:27017",
"filterSet" : false
}
Here we see that the index was indeed used ("BtreeCursor") and that only 2 objects were scanned, not all 3. Drop the index and you will get a table scan:
> db.blog.dropIndex({'comments.sender': 1})
{ "nIndexesWas" : 2, "ok" : 1 }
> db.blog.find({'comments.sender': 'Jane'}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 3,
"nscanned" : 3,
"nscannedObjectsAllPlans" : 3,
"nscannedAllPlans" : 3,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"server" : "metrics.9.0.api.production.infinit.io:27017",
"filterSet" : false
}