Search code examples
elasticsearchrelevance

Search for two fields but only score once in Elasticsearch


Let's say I have these documents in Elasticsearch:

{
    "display_name": "Jose Cummings",
    "username": "josecummings"
},
{
    "display_name": "Jose Ramirez",
    "username": "elite_gamer"
},
{
    "display_name": "Lance Abrams",
    "username": "abrams1"
},
{
    "display_name": "Steve Smith",
    "username": "josesmose"
}

I want to run a "as you type" search for Jose that searches against both the display_name and the username fields, which I can do with this:

{
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "fields": [
                        "display_name",
                        "username"
                    ],
                    "query": "Jose",
                    "type": "bool_prefix",
                    "fuzziness": "AUTO",
                    "boost": 50
                }
            }
        }
    }
}

The issue here is that when I search for Jose, Jose Cummings gets 100 points while Jose Ramirez and Steve Smith only get 50 points, because it seems to sum the scores for the two fields. This essentially rewards a user for having the same display_name as username, which we do not want to happen.

Is there a way to only take the max score from the two fields? I've tried dozens of different combinations now using function_score, boost_mode/score_mode, constant_score, trying to do a should match with multiple match_bool_prefix queries, etc. Nothing I've tried seems to achieve this.


Solution

  • Try this:

    {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "fields": [
                  "display_name^50",
                  "username^50"
                ],
                "query": "Jose",
                "type": "bool_prefix",
                "fuzziness": "AUTO",
                "tie_breaker": 0.3
              }
            }
          ]
        }
      }
    }
    

    Notice the effects of the tie_breaker being set to 0.0 as opposed to 0<x<1 and x=1.


    Also note that your bool_prefix

    scoring behaves like most_fields, but using a match_bool_prefix query instead of a match query.

    Perhaps you indeed want the fields to be prefixed w/ jose. But if the username is, say, cool_jose, it's going to get left out (unless you for example apply an other-than-standard analyzer)...