Search code examples
elasticsearchelasticsearch-queryelasticsearch-nested

Elasticsearch DSL queries - optional should terms & scores


I'm pretty new on Elasticsearch world and I might be missing some concept.

That's the scenario I'm not understanding:

I want to find a doc from the following criteria:

  • category.level = A
  • category.name = "John .G" OR "Chris T."
  • approved = yes (optional)

Mappings:

PUT data
{
  "mappings": {
    "properties": {
      "createdAt": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss.SSSZ"
      },
      "category": {
        "type": "nested",
        "properties": {
          "name": {
            "type":   "text",
            "analyzer": "keyword"
          }
        }
      },
      "approved": {
        "type":   "text",
        "analyzer": "keyword"
      }
    }
  }
}

Data:

POST data/_create/1
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Mary F.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "yes"
}

POST data/_create/2
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Chris T.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "no"
}

POST data/_create/3
{  
  "category": [
      {
        "name": "John G.",
        "level": "C"
      },
      {
        "name": "Phil C.",
        "level": "C"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "no"
}

POST data/_create/4
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Chris T.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2020-04-18 19:09:27.527+0200",
  "approved": "yes"
}

POST data/_create/5
{  
  "category": [
      {
        "name": "Unknown A.",
        "level": "A"
      },
      {
        "name": "Unknown B.",
        "level": "A"
      }
  ],
  "createdBy": "Unknown",
  "createdAt": "2020-08-18 19:09:27.527+0200",
  "approved": "yes"
}

Query:

GET data/_search
{
  "query": {
    "nested": {
      "path": "category",
      "query": {
        "bool": {
          "must": [
            {"match": {"category.level": "A"}}
          ],
          "should": [
            {"term": {"category.name": "John G."}},
            {"term": {"category.name": "Chris T."}},
            {"term": {"approved": "yes"}}
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.4455402,
    "hits" : [
      {
        "_index" : "data",
        "_id" : "2",
        "_score" : 1.4455402,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Chris T.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2022-04-18 19:09:27.527+0200",
          "approved" : "no"
        }
      },
      {
        "_index" : "data",
        "_id" : "4",
        "_score" : 1.4455402,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Chris T.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2020-04-18 19:09:27.527+0200",
          "approved" : "yes"
        }
      },
      {
        "_index" : "data",
        "_id" : "1",
        "_score" : 1.151647,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Mary F.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2022-04-18 19:09:27.527+0200",
          "approved" : "yes"
        }
      }
    ]
  }
}

Questions:

  1. Why the first document returned is an approval = no? I was expecting that docs with approval = yes would be better scored.
  2. Why doc with index = 5 (it doesn't attend the criteria category.name, but it does for approved = yes) is not being returned?
  3. The optionality of approved = yes is not being expressed in the above query. How could I create a kind of extra separated should term with minimum_should_match: 0 ? Something that would increase the score but would not filter the results.

Solution

  • You need to use below query, which have main bool query. it have first must clause with nested query and it have bool query for category.level field and then another bool query with should clause for category.name field.

    Now main bool query have should clause for approved which is used for boosting result with yes value (this is outside nested query).

    POST data/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "category",
                "query": {
                  "bool": {
                    "must": [
                      {
                        "term": {
                          "category.level": {
                            "value": "a"
                          }
                        }
                      },
                      {
                        "bool": {
                          "should": [
                            {
                              "term": {
                                "category.name": "John G."
                              }
                            },
                            {
                              "term": {
                                "category.name": "Chris T."
                              }
                            }
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          ],
          "should": [
            {
              "term": {
                "approved": "yes"
              }
            }
          ]
        }
      }
    }
    

    Result:

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 1.9845366,
        "hits" : [
          {
            "_index" : "data",
            "_type" : "_doc",
            "_id" : "4",
            "_score" : 1.9845366,
            "_source" : {
              "category" : [
                {
                  "name" : "John G.",
                  "level" : "A"
                },
                {
                  "name" : "Chris T.",
                  "level" : "A"
                }
              ],
              "createdBy" : "John",
              "createdAt" : "2020-04-18 19:09:27.527+0200",
              "approved" : "yes"
            }
          },
          {
            "_index" : "data",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.6906434,
            "_source" : {
              "category" : [
                {
                  "name" : "John G.",
                  "level" : "A"
                },
                {
                  "name" : "Mary F.",
                  "level" : "A"
                }
              ],
              "createdBy" : "John",
              "createdAt" : "2022-04-18 19:09:27.527+0200",
              "approved" : "yes"
            }
          },
          {
            "_index" : "data",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.4455402,
            "_source" : {
              "category" : [
                {
                  "name" : "John G.",
                  "level" : "A"
                },
                {
                  "name" : "Chris T.",
                  "level" : "A"
                }
              ],
              "createdBy" : "John",
              "createdAt" : "2022-04-18 19:09:27.527+0200",
              "approved" : "no"
            }
          }
        ]
      }
    }
    
    

    Why the first document returned is an approval = no? I was expecting that docs with approval = yes would be better scored.

    Because you have should clause inside nested query and it is no matching to any document as approved is outside category hence it is not changing score.

    Why doc with index = 5 (it doesn't attend the criteria category.name, but it does for approved = yes) is not being returned?

    it is removed by your must clause, but if you need index =5 document as well then you can add two should clause, one for nested and one for approved and it will resolved your issue.

    Your question 3 also resolved by my answer.