Search code examples
elasticsearchkibanaelastic-stackelasticsearch-dsl

How to use function_score with nested object in ElasticSearch


I have an index with a nested object property as follows:

PUT /mycvs
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "experiences": {
        "type": "nested",
        "properties": {
          "date": {
            "type": "date"
          },
          "tools": {
            "type": "text"
          }
        }
      }
    }
  }
}

With some data:

POST /mycvs/_doc
{
  "name": "Michael",
  "experiences": [
    { "date": "2023-03-13T19:50:14.820Z", "tools": ["alpha", "beta"] },
    { "date": "2022-03-13T19:50:14.820Z", "tools": ["alpha", "beta"] },
    { "date": "2021-03-13T19:50:14.820Z", "tools": ["beta", "gamma"] }
  ]
}
POST /mycvs/_doc
{
  "name": "Pam",
  "experiences": [
    { "date": "2023-03-13T19:50:14.820Z", "tools": ["beta"] },
    { "date": "2020-03-13T19:50:14.820Z", "tools": ["gamma"] },
    { "date": "2019-03-13T19:50:14.820Z", "tools": ["beta"] }
  ]
}
POST /mycvs/_doc
{
  "name": "Dwight",
  "experiences": [
    { "date": "2022-03-13T19:50:14.820Z", "tools": ["beta"] },
    { "date": "2021-03-13T19:50:14.820Z", "tools": ["gamma", "beta"] },
    { "date": "2021-03-13T19:50:14.820Z", "tools": ["gamma"] }
  ]
}

And now, when I want to search into the data that I have, with the following query, I got all 3 items, which is perfect, but I don't understand why Michael is the last in the output.

With 3 times the beta value, I expected to see it first in the result. How can I get it to be at the top of the result?

GET /mycvs/_search
{
  "query": {
    "nested": {
      "path": "experiences",
      "query": {
        "match_phrase": { "experiences.tools": { "query": "beta" } }
      }
    }
  }
}

My final goal is to sort results by the date field: get all the cv that have a given tool and sort them by occurrence and increase their score if they are on recent experience.

GET /mycvs/_search
{
  "query": {
    "nested": {
      "path": "experiences",
      "query": {
        "function_score": {
          "query": {
            "match_phrase": { "experiences.tools": { "query": "beta" } }
          },
          "score_mode": "multiply",
          "functions": [
            { "filter": { "range": { "experiences.date": { "gte": "now", "lt": "now-1y" } } }, "weight": 5 },
            { "filter": { "range": { "experiences.date": { "gte": "now-1y", "lt": "now-2y" } } }, "weight": 4 },
            { "filter": { "range": { "experiences.date": { "gte": "now-2y", "lt": "now-3y" } } }, "weight": 3 },
            { "filter": { "range": { "experiences.date": { "gte": "now-3y", "lt": "now-4y" } } }, "weight": 2 },
            { "filter": { "range": { "experiences.date": { "gte": "now-4y", "lt": "now-5y" } } }, "weight": 1 },
            { "filter": { "range": { "experiences.date": { "gte": "now-5y" } } }, "weight": 1 }
          ]
        }
      }
    }
  }
}

But it doesn't seem to work either.

What am I doing wrong? Any clue on this?

I've tried this on both 7.17.0 and 8.6.2 version of ElasticSearch.

Thanks a lot


Solution

  • I ended up using a script_score like this:

    GET /mycvs/_search
    {
      "query": {
        "nested": {
          "path": "experiences",
          "query": {
            "script_score": {
              "query": {
                "match_phrase": {
                  "experiences.tools": {
                    "query": "beta"
                  }
                }
              },
              "script": {
                "source": """
                        long yearsDiffEndExperienceEndToNow = ((new Date().getTime() - doc['experiences.date'].value.getMillis()) / 1000 / 86400 / 365);
                        if (yearsDiffEndExperienceEndToNow < 5) {
                          return 5 - yearsDiffEndExperienceEndToNow;
                        } else {
                          return 1;
                        }
                    """
              }
            }
          }
        }
      }
    }