Search code examples
mongodbdatabase-performancequery-performanceindices

MongoDB complex indices


I'm trying to understand how to best work with indices in MongoDB. Lets say that I have a collection of documents like this one:

{
  _id:        1,
  keywords:   ["gap", "casual", "shorts", "oatmeal"],
  age:        21,
  brand:     "Gap",
  color:     "Black",
  gender:    "female",     
  retailer:  "Gap",
  style:     "Casual Shorts",
  student:    false,
  location:  "US",
}

and I regularly run a query to find all documents that match a set of criteria for each of those fields, something like:

db.items.find({ age:      { $gt: 13, $lt: 40 },
                brand:    { $in: ['Gap', 'Target'] },
                retailer: { $in: ['Gap', 'Target'] },
                gender:   { $in: ['male', 'female'] },
                style:    { $in: ['Casual Shorts', 'Jeans']},
                location: { $in: ['US', 'International'] },
                color:    { $in: ['Black', 'Green'] },
                keywords: { $all: ['gap', 'casual'] }
              })

I'm trying to figure what sort of index I can create to improve the speed of a query such as this. Should I create a compound index like this:

db.items.ensureIndex({ age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})

or is there a better set of indices I can create to optimize this query?


Solution

  • Should I create a compound index like this:

    db.items.ensureIndex({age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})

    You can create an index like the one above, but you're indexing almost the entire collection. Indexes take space; the more fields in the index, the more space is used. Usually RAM, although they can be swapped out. They also incur write penalty.

    Your index seems wasteful, since probably indexing just a few of those fields will make MongoDB scan a set of documents that is close to the expected result of the find operation.

    Is there a better set of indices I can create to optimize this query?

    Like I said before, probably yes. But this question is very difficult to answer without knowing details of the collection, like the amount of documents it has, which values each field can have, how those values are distributed in the collection (50% gender male, 50% gender female?), how they correlate to each other, etc.

    There are a few indexing strategies, but normally you should strive to create indexes with high selectivity. Choose "small" field combinations that will help MongoDB locate the desired documents scanning a "reasonable" amount of them. Again, "small" and "reasonable" will depend on the characteristics of the collection and query you are performing.

    Since this is a fairly complex subject, here are some references that should help you building more appropriate indexes.

    http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/ http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index http://docs.mongodb.org/manual/tutorial/create-queries-that-ensure-selectivity/

    And use cursor.explain to evaluate your indexes.

    http://docs.mongodb.org/manual/reference/method/cursor.explain/