Search code examples
pythonmongodbindexingmultikey

Why is a single index faster than a compound index in a query for two keys? (MongoDB, multi-key)


I have created 4 indexes to test query performance in my collection when quering for two fields of the same document, one of which is an array (needs a multi-key index). Two of the indexes are single and two compound.

I am surpised because of getting better performance with one of the single indexes than with the compound ones. I was expecting to obtain the best performace with a compound index, because I understand that it indexes the two fields allowing for faster querying.

These are my indexes:

{    "v" : 1, 
     "key" : { "_id" : 1 }, 
     "ns" : "bt_twitter.mallorca.mallorca", 
     "name" : "_id_"  
}, 
{    "v" : 1, 
     "key" : { "epoch_creation_date" :1 }, 
     "ns" : "bt_twitter.mallorca.mallorca", 
     "name" : "epoch_creation_date_1"  
}, 
{     "v" : 1, 
      "key" : { "related_hashtags" : 1 }, 
      "ns" : "bt_twitter.mallorca.mallorca", 
      "name" : "related_hashtags_1"  
},  
{     "v" : 1, 
      "key" : { "epoch_creation_date" : 1, "related_hashtags" : 1 }, 
      "ns" : "bt_twitter.mallorca.mallorca", 
      "name" : "epoch_creation_date_1_related_hashtags_1"  
}

My queries and performance indicators are (hint parameter shows the index used at each query):

QUERY 1:

active_collection.find(
    {'epoch_creation_date': {'$exists': True}}, 
    {"_id": 0, "related_hashtags":1}
).hint([("epoch_creation_date", ASCENDING)]).explain()

millis: 237

nscanned: 101226

QUERY 2:

active_collection.find(
    {'epoch_creation_date': {'$exists': True}}, 
    {"_id": 0, "related_hashtags": 1}
).hint([("related_hashtags", ASCENDING)]).explain()

millis: 1131

nscanned: 306715

QUERY 3:

active_collection.find(
     {'epoch_creation_date': {'$exists': True}},
     {"_id": 0, "related_hashtags": 1}
).hint([("epoch_creation_date", ASCENDING), ("related_hashtags", ASCENDING)]).explain()

millis: 935

nscanned: 306715

QUERY 4:

active_collection.find(
     {'epoch_creation_date': {'$exists': True}}, 
     {"_id": 0, "related_hashtags": 1}
).hint([("related_hashtags", ASCENDING),("epoch_creation_date", ASCENDING)]).explain()

millis: 1165

nscanned: 306715

QUERY 1 scans less documents, what is probably the reason to be faster. Can somebody help me to understand why is it performing better than queries with compound indexes? Therefore, when is better to use a compound index than a single one?

I am reading mongo documentation but these concepts are resulting hard for me to digest.

Thanks in advance.

UPDATED question (in response to Sammaye and Philipp)

This is the result of a full explain()

"cursor" : "BtreeCursor epoch_creation_date_1",
"isMultiKey" : false,
"n" : 101226,
"nscannedObjects" : 101226,
"nscanned" : 101226,
"nscannedObjectsAllPlans" : 101226,
"nscannedAllPlans" : 101226,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 242,
"indexBounds" : {u'epoch_creation_date': [[{u'$minElement': 1}, {u'$maxElement': 1}]]

},
"server" : "vmmongodb:27017"

for the following query:

active_collection.find(
{'epoch_creation_date': {'$exists': True}}, 
{"_id": 0, "related_hashtags":1})
.hint([("epoch_creation_date", ASCENDING)]).explain()

Solution

  • You created a compound index (named epoch_creation_date_1_related_hashtags_1), but you aren't using it in those hints. Instead of that you are using the two single-field indexes you also created (related_hashtags_1 and epoch_creation_date_1) in different order.

    Of those two indexes, only epoch_creation_date_1 is effective, because you aren't querying for both fields. You are only querying for one, and this is 'epoch_creation_date': {'$exists': True}. The field-filtering which you perform with {"_id": 0, "related_hashtags":1} is done on the documents which were found by that query. At that point, indexes are of no use anymore. That means any index on related_hashtags won't be able to increase performance on this query. The compound index (when you would actually use it) might be better than no index at all, but not as good as the index on epoch_creation_date only.