Search code examples
elasticsearchelasticsearch-5elasticsearch-pluginelasticsearch-dslelasticsearch-query

ElasticSearch: Fetch records from nested Array that "only" include given element/s and filter-out the rest with mixed values


I am stuck on one of my tasks.

Overview:

  • There are some records on elastic search. Which includes information about the candidates and their employment.
  • There is a field that stores information about the statuses in which the candidate got submitted.
 {
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingClient", "jobId": "XYZ", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
}

I want to write an es query to fetch all the records in which submitted jobs array "only" have "pendingPM" statuses and no other statuses.

"query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "submittedJobs",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "submittedJobs.status.keyword": "PendingPM"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }

I tried this query, and it returns the records which include "pendingPM" along with other statuses - might use contains() logic.

here is the mapping

"submittedJobs": {
    "type": "nested",
    "properties": {
        "statusId": {
            "type": "long"
        },
        "status": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256,
                    "normalizer": "lowercase_normalizer"
                }
            }
        },
        "jobId": {
            "type": "keyword"
        }
    }
}

For example. let's suppose there are two documents

document #1:
{
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingClient", "jobId": "XYZ", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
},

document #2:
{
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
}

Only document #2 should be returned, as the entire array contains only "PendingPM" and no other statuses.

Document #1 will be filtered-out since it includes mixed statuses.

Any help will be appreciated.


Solution

  • Try this:

    Will be return only document with all item of array with status PendingPM.

    {
      "query": {
        "bool": {
          "must_not": [
            {
              "nested": {
                "path": "submittedJobs",
                "query": {
                  "bool": {
                    "must_not": [
                      {
                        "match": {
                          "submittedJobs.status": {
                            "query": "PendingPM"
                          }
                        }
                      },
                      {
                        "match": {
                          "submittedJobs.status": {
                            "query": "PendingClient"
                          }
                        }
                      }
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }