I have an Elasticsearch index with document structure like below.
{
"id": "foo",
"tags": ["Tag1", "Tag2", "Tag3"],
"special_tags": ["SpecialTag1", "SpecialTag2", "SpecialTag3"],
"reserved_tags": ["ReservedTag1", "ReservedTag2", "Tag1", "SpecialTag2"],
// rest of the document
}
The fields tags
, special_tags
, reserved_tags
are stored separately for multiple use cases. In one of the queries, I want to order the documents by number of occurrences for searched tags in all the three fields.
For example, if I am searching with three tags Tag1
,
Tag4
and SpecialTag3
, total occurrences are 2 in the above document. Using this number, I want to add a custom score to this document and sort by the score.
I am already using function_score
as there are few other attributes on which the scoring depends. To compute the matched number, I tried painless script like below.
def matchedTags = 0;
def searchedTags = ["Tag1", "Tag4", "SpecialTag3"];
for (int i = 0; i < searchedTags.length; ++i) {
if (doc['tags'].contains(searchedTags[i])) {
matchedTags++;
continue;
}
if (doc['special_tags'].contains(searchedTags[i])) {
matchedTags++;
continue;
}
if (doc['reserved_tags'].contains(searchedTags[i])) {
matchedTags++;
}
}
// logic to score on matchedTags (returning matchedTags for simplicity)
return matchedTags;
This runs as expected, but extremely slow. I assume that ES has to count the occurrences for each doc and cannot use indexes here. (If someone can shed light on how this will work internally or provide documentation/resources links, that would be helpful.)
I want to have two scoring functions.
Is there any way where I can get benefits of both faster searching and also the custom scoring using script?
Any help is appreciated. Thanks.
We solved this using bitsets. We ended up creating a bitset of tags that has a set bit for all the tags we have in the document (tags
, special_tags
, etc.) and clear bit for rest. This is stored as one big integer. This is like a condensed version of all tags we have in one document represented as bits.
The application knows which bit is which tag. While doing the matched tag count, we create a bitset that is set for all searched tags. Then in painless script, we cast both bitsets to big integers, take a bitwise AND and count the number of set bits.