I have catalog with products and I want to filter it by many parameters: category, price, size, color, wieght etc..
So the question is about indexing.
I can try to use compound index on all fields, and query them in the same order I indexed them. But what if I need to filter only by size & color, and then only by price & weight? Creating compound index for each possible filtering query would be overkill because there might be too many parameters.
So after some search I found an interesting approach
It is suggested there to use "normalized attributes":
{color: "red"} = 10
{weight: 125} = 25
{size: "M"} = 30
and now mongo record will look like this:
{_id: ..., attributes: [10,25,30]}
Then I should index by attributes and after that I can query this way:
db.items.find(attributes: {$all: [10,25,30]})
Advatages are:
What I didn't undestand is HOW do I get those numbers for each attribute? Are they calculated somehow (like md5)? Or should I create a different collection and store each key-value-number there? And get the numbers from there first - each time when I need to filter "items" collection?
And what do you think about this approach?
UPDATE: What if I will use concatenated strings instead of numbers?
{_id: ..., attributes: ["language.English", "color.red"]}
Searching (efficiently) in a store catalog is indeed a non-trivial task.
Yes, you can create an additional collection and store all values there
{name: "language", value: "English", numValue: "13"}
No, you don't need to re-query this collection every time. Data size here will be likely small, so you can cache it in application process. Don't forget the hint: always put attribute with lowest cardinality first.
Another approach would be to use bit-masking, but since MongoDB has limited integer length, you can't get unlimited number of attributes.