I'm trying to understand how to solve this in Opensearch (but Elasticsearch solutions will do).
Essentially, I have an index of jobs and I'm trying to sort them according to two parameters, giving equal weight to each: subscription tier and a popularity score (each being fields in each job doc).
Normally, when you sort you sort based on one first and then the other, essentially I need to blend them and give 50/50 weight to each.
When jobs are sorted by relevance (default), we want this to be a combination of both their subscription tier and the jobs individual relevance score, according to a weighting w, e.g. this formula:
Jobs would be ranked according to their weighted score.
Weighted score = (r1 x w) + (r2 x (1-w) where:
r1 = the position a job ranks for a given search if only relevance is considered; and r2 = the position a job ranks for a given search if only subscription is considered
However the problem of is I would need to perform multiple searches to obtain the rank for each sorting criteria of each job, which would be terribly inefficient. I'm trying to see if I can solve this problem natively with Opensearch.
I was trying to compute this as a script score function for example, using purely the two fields, but they are totally unrelated and not normalised between then, so assigning equal weight becomes challenging.
Here is what I tried so far. First adding some test documents:
POST _bulk
{"index":{"_index":"tier-sort","_id":"1"}}
{"title":"Job 1","popularity_score":"0.105","bid":"100"}
{"index":{"_index":"tier-sort","_id":"2"}}
{"title":"Job 2","popularity_score":"0.06","bid":"50"}
{"index":{"_index":"tier-sort","_id":"3"}}
{"title":"Job 3","popularity_score":"0.099","bid":"25"}
{"index":{"_index":"tier-sort","_id":"4"}}
{"title":"Job 4","popularity_score":"0.155","bid":"5"}
{"index":{"_index":"tier-sort","_id":"5"}}
{"title":"Job 5","popularity_score":"0.028","bid":"100"}
{"index":{"_index":"tier-sort","_id":"6"}}
{"title":"Job 6","popularity_score":"0.118","bid":"100"}
{"index":{"_index":"tier-sort","_id":"7"}}
{"title":"Job 7","popularity_score":"0.186","bid":"50"}
{"index":{"_index":"tier-sort","_id":"8"}}
{"title":"Job 8","popularity_score":"0.019","bid":"25"}
{"index":{"_index":"tier-sort","_id":"9"}}
{"title":"Job 9","popularity_score":"0.081","bid":"5"}
{"index":{"_index":"tier-sort","_id":"10"}}
{"title":"Job 10","popularity_score":"0.124","bid":"100"}
{"index":{"_index":"tier-sort","_id":"11"}}
{"title":"Job 11","popularity_score":"0.163","bid":"100"}
{"index":{"_index":"tier-sort","_id":"12"}}
{"title":"Job 12","popularity_score":"0.025","bid":"50"}
{"index":{"_index":"tier-sort","_id":"13"}}
{"title":"Job 13","popularity_score":"0.16","bid":"25"}
{"index":{"_index":"tier-sort","_id":"14"}}
{"title":"Job 14","popularity_score":"0.119","bid":"5"}
{"index":{"_index":"tier-sort","_id":"15"}}
{"title":"Job 15","popularity_score":"0.16","bid":"100"}
Then, I tried to use the script score so that each factor contributes half to the sorting:
GET tier-sort/_search
{
"size": 100,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "doc['popularity_score'].value"
},
},
{
"script_score": {
"script": "doc['bid'].value"
},
}
]
}
}
}
The problem however is normalisation. Bid and Popularity are completely different scales. How can this be achieved in Elasticsearch? Is there a way to do this natively?
Thanks in advance!
There are 2 ways to change the ranking of search results of Elasticsearch/Opensearch
_score
_score
, but if you specify sort logic other than _score
, the boosting logic will be ignored, and _score
will be set as null, only sort part take effectiveIf you have both factors in different scale, then the rank_features can help you to normalize efficiently, eg.
Adding some docs
POST _bulk
{"index":{"_index":"tier-sort","_id":"1"}}
{"title":"Job 1","rank":{"popularity_score":0.105,"bid":100}}
{"index":{"_index":"tier-sort","_id":"2"}}
{"title":"Job 2","rank":{"popularity_score":0.06,"bid":50}}
{"index":{"_index":"tier-sort","_id":"3"}}
{"title":"Job 3","rank":{"popularity_score":0.099,"bid":25}}
Apply rank_feature in query
GET tier-sort/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"rank_feature": {
"field": "rank.popularity_score",
"saturation": {},
"boost": 0.5
}
},
{
"rank_feature": {
"field": "rank.bid",
"saturation": {},
"boost": 0.5
}
}
]
}
}
}
You can choose different built-in functions in rank feature and adjust the pivot to control the result. You can also use explain api to get know how the score been calculated in detail. This can help you to check whether the query run as you expectation