Search code examples
elasticsearchboosting

How can I boost the field length norm in elasticsearch function score?


I know that elasticsearch takes in account the length of a field when computing the score of the documents retrieved by a query. The shorter the field, the higher the weight (see The field-length norm).

I like this behaviour: when I search for iphone I am much more interested in iphone 6 than in Crappy accessories for: iphone 5 iphone 5s iphone 6.

Now, I would like to try to boost this stuff, let's say that I want to double its importance.

I know that one can modify the score using the function score, and I guess that I can achieve what I want via script score.

I tried to add another field-length norm to the score like this:

    {
     "query": {
       "function_score": {
         "boost_mode": "replace",
         "query": {...},
         "script_score": {
             "script": "_score + norm(doc)"
         }
       }
     }
   }

But I failed badly, getting this error: [No parser for element [function_score]]

EDIT:

My first error was that I hadn't wrapped the function score in a "query". Now I edited the code above. My new error says

GroovyScriptExecutionException[MissingMethodException
[No signature of method: Script5.norm() is applicable for argument types:
(org.elasticsearch.search.lookup.DocLookup) values: 
[<org.elasticsearch.search.lookup.DocLookup@2c935f6f>]
Possible solutions: notify(), wait(), run(), run(), dump(), any()]]

EDIT: I provided a first answer, but I'm hoping for a better one


Solution

  • It looks like you could achieve that using a field of type token_count together with a field_value_factor function score.

    So, something like this in the field mapping:

    "name": { 
      "type": "string",
      "fields": {
        "length": { 
          "type":     "token_count",
          "analyzer": "standard"
        }
      }
    }
    

    This will use the number of tokens in the field. If you want to use the number of characters, you can change the analyzer from standard to a custom one that tokenizes each character.

    Then in the query:

    "function_score": {
      ...,
      "field_value_factor": {
        "field": "name.length",
        "modifier": "reciprocal"
      }
    }