performance mongodb indexing filtering compound-index

MongoDB filtering by many parameters (compound index or not)

I have catalog with products and I want to filter it by many parameters: category, price, size, color, wieght etc..

So the question is about indexing.

I can try to use compound index on all fields, and query them in the same order I indexed them. But what if I need to filter only by size & color, and then only by price & weight? Creating compound index for each possible filtering query would be overkill because there might be too many parameters.

So after some search I found an interesting approach

It is suggested there to use "normalized attributes":

{color: "red"} = 10
{weight: 125} = 25
{size: "M"} = 30

and now mongo record will look like this:

{_id: ..., attributes: [10,25,30]}

Then I should index by attributes and after that I can query this way:

db.items.find(attributes: {$all: [10,25,30]})

Advatages are:

smaller indexes
faster lookups
any number of parameters
easy growing

What I didn't undestand is HOW do I get those numbers for each attribute? Are they calculated somehow (like md5)? Or should I create a different collection and store each key-value-number there? And get the numbers from there first - each time when I need to filter "items" collection?

And what do you think about this approach?

UPDATE: What if I will use concatenated strings instead of numbers?

{_id: ..., attributes: ["language.English", "color.red"]}

Solution

Searching (efficiently) in a store catalog is indeed a non-trivial task.

Yes, you can create an additional collection and store all values there

{name: "language", value: "English", numValue: "13"}

No, you don't need to re-query this collection every time. Data size here will be likely small, so you can cache it in application process. Don't forget the hint: always put attribute with lowest cardinality first.

Another approach would be to use bit-masking, but since MongoDB has limited integer length, you can't get unlimited number of attributes.