Search code examples
elasticsearchelasticsearch-dslelasticsearch-7

ElasticSearch: Query nested array for empty and specific value in single query


Documents structure -

{
  "hits": [
    {
      "_type": "_doc",
      "_id": "ef0a2c44179a513476b080cc2a585d95",
      "_source": {
        "DIVISION_NUMBER": 44,
        "MATCHES": [
          {
            "MATCH_STATUS": "APPROVED",
            "UPDATED_ON": 1599171303000
          }
        ]
      }
    },
    {
      "_type": "_doc",
      "_id": "ef0a2c44179a513476b080cc2a585d95",
      "_source": {
        "DIVISION_NUMBER": 44,
        "MATCHES": [ ]
      }
    }
  ]
}

Question - MATCHES is a nested array inside there is a text field MATCH_STATUS that can have any values say "APPROVED","REJECTED". I am looking to search ALL documents that contain MATCH_STATUS having values say "APPROVED", "RECOMMENDED" as well as where there is no data in MATCHES (empty array "MATCHES": [ ]). Please note I want this in a single query.

I am able to do this in two separate queries like this -

GET all matches with status = RECOMMENDED, APPROVED

"must": [
        {
          "nested": {
            "path": "MATCHES",
            "query": {
              "terms": {
                "MATCHES.MATCH_STATUS.keyword": [
                  "APPROVED",
                  "RECOMMENDED"
                ]
              }
            }
          }
        }
      ]
 

GET all matches having empty array "MATCHES" : [ ]

{
  "size": 5000,
  "query": {
    "bool": {
      "filter": [],
      "must_not": [
        {
          "nested": {
            "path": "MATCHES",
            "query": {
              "exists": {
                "field": "MATCHES"
              }
            }
          }
        }
      ]
    }
  },
  "from": 0
}

Solution

  • You can combine both queries using should clause.

    {
      "query": {
        "bool": {
          "minimum_should_match": 1, 
          "should": [
            {
              "nested": {
                "path": "MATCHES",
                "query": {
                  "bool": {
                    "minimum_should_match": 1,
                    "should": [
                      {
                        "terms": {
                          "MATCHES.MATCH_STATUS.keyword": [
                            "APPROVED",
                            "RECOMMENDED"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "nested": {
                      "path": "MATCHES",
                      "query": {
                        "bool": {
                          "filter": {
                            "exists": {
                              "field": "MATCHES"
                            }
                          }
                        }
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    

    Update: To answer your comment.

    Missing aggregation does not support nested field for now. There is open issue as of now.

    To get count of empty matches, you can use a filter aggregation with the nested query wrapped into the must_not clause of the bool query.

    {
      "aggs": {
        "missing_matches_agg": {
          "filter": {
            "bool": {
              "must_not": {
                "nested": {
                  "query": {
                    "match_all": {}
                  },
                  "path": "MATCHES"
                }
              }
            }
          }
        }
      }
    }