Search code examples
elasticsearchsearchelasticsearch-7

How to take (length of the aliases field) out of score calculation


Suppose we have a documents of people with their name and array of aliases like this:

{
   name: "Christian",
   aliases: ["נוצרי", "کریستیان" ]
}

Suppose I have a document with 10 aliases and another one with 2 aliases but both of them contains alias with value کریستیان.

The length of field (dl) for the first document is bigger than the second document so the term frequency (tf) of the first document gets lower than the second one. eventually the score of the document with less aliases is bigger than another.

Sometimes I want to add more aliases for person in different languages and different forms because he/she is more famous but it causes to get lower score in results. I want to somehow take length of the aliases field out of my query's calculation.


Solution

  • Norms store the relative length of the field.

    How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body field.

    Norms can be disabled using PUT mapping api

    PUT my_index/_mapping
    {
      "properties": {
        "title": {
          "type": "text",
          "norms": false
        }
      }
    }
    

    Links for further study

    1. https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html#field-norm