Mongo docs state:
The Mongo multikey feature can automatically index arrays of values.
That's nice. But how about sorting based on multikeys? More specifically, how to sort a collection according to array match percentage?
For example, I have a pattern [ 'fruit', 'citrus' ]
and a collection, that looks like this:
{
title: 'Apples',
tags: [ 'fruit' ]
},
{
title: 'Oranges',
tags: [ 'fruit', 'citrus' ]
},
{
title: 'Potato',
tags: [ 'vegetable' ]
}
Now, I want to sort the collection according to match percentage of each entry to the tags pattern. Oranges must come first, apples second and potatoes last.
What's the most efficient and easy way to do it?
As of MongoDB 2.1 a similar computation can be done using the aggregation framework. The syntax is something like
db.fruits.aggregate(
{$match : {tags : {$in : ["fruit", "citrus"]}}},
{$unwind : "$tags"},
{$group : {_id : "$title", numTagMatches : {$sum : 1}}},
{$sort : {numTagMatches : -1}} )
which returns
{
"_id" : "Oranges",
"numTagMatches" : 2
},
{
"_id" : "Apples",
"numTagMatches" : 1
}
This should be much faster than the map-reduce method for two reasons. First because the implementation is native C++ rather than javascript. Second, because "$match" will filter out the items which don't match at all (if this is not what you want, you can leave out the "$match" part, and change the "$sum" part to be either 1 or 0 depending on if the tag is equal to "fruit" or "citrus" or neither).
The only caveat here is that mongo 2.1 isn't recommended for production yet. If you're running in production you'll need to wait for 2.2. But if you're just experimenting on your own you can play around with 2.1, as the aggregation framework should be more performant.