Search code examples
mongodbmongodb-indexes

Mongo Triple Compound Index


If you have a double compound index { a : 1, b : 1}, it makes sense to me that the index won't be used if you query on b alone (i.e. you cannot "skip" a in your query). The index will however be used if you query on a alone.

However, given a triple compound index { a : 1, b: 1, c: 1} my explain command is showing that the index is used when you query on a and c (i.e. you can "skip" b in your query).

How can Mongo use an abc index on a query for ac, and how effective is the index in this case?

Background:

My use case is that sometimes I want to query on a,b,c and sometimes I want to query on a,c. Now should I create only 1 index on a,b,c or should I create one on a,c and one on a,b,c?

(It doesn't make sense to create an index on a,c,b because c is a multi-key index with good selectivity.)


Solution

  • bottom line / tl;dr: Index b can be 'skipped' if a and c are queried for equality or inequality, but not, for instance, for sorts on c.

    This is a very good question. Unfortunately, I couldn't find anything that authoritatively answers this in greater detail. I believe the performance of such queries has improved over the last years, so I wouldn't trust old material on the topic.

    The whole thing is quite complicated because it depends on the selectivity on your indexes and whether you query for equality, inequality and/or sort, so explain() is your only friend, but here are some things I found:

    Caveat: What comes now is a mixture of experimental results, reasoning and guessing. I might be stretching Kyle's analogy too far, and I might even be completely wrong (and unlucky, because my test results loosely match my reasoning).

    It is clear that the index of A can be used, which, depending on the selectivity of A, is certainly very helpful. 'Skipping' B can be tricky, or not. Let's keep this similar to Kyle's cookbook example:

    French
        Beef
            ...
        Chicken
            Coq au Vin
            Roasted Chicken
        Lamb
            ...
        ...
    

    If you now ask me to find some French dish called "Chateaubriand", I can use index A and, because I don't know the ingredient, will have to scan all dishes in A. On the other hand, I do know that the list of dishes in each category is sorted through the index C, so I will only have to look for the strings starting with, say, "Cha" in each ingredient-list. If there are 50 ingredients, I will need 50 lookups instead of just one, but that is a lot better than having to scan every French dish!

    In my experiments, the number was a lot smaller than the number of distinct values in b: it never seemd to exceed 2. However, I tested this only with a single collection, and it probably has to do with the selectivity of the b-index.

    If you asked me to give you an alphabetically sorted list of all French dishes, though, I'd be in trouble. Now the index on C is worthless, I'd have to merge-sort all those index lists. I will have to scan every element to do so.

    This reflects in my tests. Here are some simplified results. The original collection has datetimes, ints and strings, but I wanted to keep things simple, so it's now all ints.

    Essentially, there are only two classes of queries: those where nscanned <= 2 * limit, and those that have to scan the entire collection (120k documents). The index is {a, b, c}:

    // fast (range query on c while skipping b)
    > db.Test.find({"a" : 43, "c" : { $lte : 45454 }});
    // slow (sorting)
    > db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "c" : -1});
    > db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "b" : -1}); 
    
    // fast (can sort on c if b included in the query)
    > db.Test.find({"a" : 43, "b" : 7887, "c" : { $lte : 45454 }}).sort({ "c" : -1});
    
    // fast (older tutorials claim this is slow)
    > db.Test.find({"a" : {$gte : 43}, "c" : { $lte : 45454 }});
    

    Your mileage will vary.