Search code examples
elasticsearchscoring

Elasticsearch Function Scoring based on max score within array / nested


I have a field within my document, which stores an array of Integers.

Java Class:

public class Clazz {
    public List<Foo> foo;

    public static Foo {
         public Integer bar;
         public Integer baz;
    }
}

Mapping:

"properties" : {
    "foo" : {
        "properties" : {
          "bar" : {
            "type" : "integer"
          },
          "baz" : {
            "type" : "integer"
          }
        }
    }
}

Example documents:

{
    id: 1
    foo: [
        { bar: 10 }, 
        { bar: 20 }
    ]
},

{
    id: 2
    foo: [
        { bar: 15 }
    ]
}

Now I'd like to do my scoring. The scoring function is given an input value: 10.

And the scoring function basically is: "The closer foo.bar is to input, the higher the score. And if foo.bar is lower than input the score is only half as good"

The query:

"function_score" : {
    "functions" : [ {
        "script_score" : {
            "script" : "if(doc['foo.bar'].value >= input) { (input - doc['foo.bar'].value) * 1 } else { (doc['foo.bar'].value - input) * 2 }",
            "lang" : "groovy",
            "params" : {
                "input" : 10
            }
      }
} ],
"score_mode" : "max",
"boost_mode" : "replace"

}

Expected result:

id 1 should be first, because there's a foo.bar that matches input=10.

What happens:

The scoring works perfectly, if the documents have only a single foo.bar value. If it's an array (like in document with id 1) Elasticsearch seems to take the last value within the array.

What the query should do:

Take the best score. That's why I used score_mode: max. But it seems, that this only respects the functions array within the function_score, and not (as I did expect) the possible scores within a function.


I read somewhere about using doc['foo.bar'].values (values instead of value), but I don't know how to use it in this case.

Do you have an idea, how to get this working?


Solution

  • One way to achieve this using groovy is as below i.e you can use the max method of list on values.

    Example :

    {
       "query": {
          "function_score": {
             "functions": [
                {
                   "script_score": {
                      "script": "max_score=doc[\"foo.bar\"].values.max();if(max_score >= input) {return (max_score - input);} else { return (max_score - input) *2;}",
                      "lang": "groovy",
                      "params": {
                         "input": 10
                      }
                   }
                }
             ],
             "score_mode": "max",
             "boost_mode": "replace"
          }
       }
    }