Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dsl

Elasticsearch: filter documents by array passed in request contains all document array elements


My documents stored in elasticsearch have following structure:

{
  "id": 1,
  "test": "name",
  "rules": [
    {
      "id": 2,
      "name": "rule1",
      "ruleDetails": [
        {
          "id": 3,
          "requiredAnswerId": 1
        },
        {
          "id": 4,
          "requiredAnswerId": 2
        },
        {
          "id": 5,
          "requiredAnswerId": 3
        }
      ]
    }
  ]
}

where, rules property has nested type.

I need to query documents by checking that array of requiredAnswerId passed in the search request (provided terms) contains all rules.ruleDetails.requiredAnswerId stored in the document.

Does anyone know which elasticsearch option I can use to build such specific query? Or maybe, it is better to fetch the whole document and perform filtering on the application level.

UPDATED Adding mapping

{
  "my_index": {
    "mappings": {
      "properties": {
        "id": {
          "type": "long"
        },
        "test": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "rules": {
          "type": "nested",
          "properties": {
            "id": {
              "type": "long"
            },
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "ruleDetails": {
              "properties": {
                "id": {
                  "type": "long"
                },
                "requiredAnswerId": {
                  "type": "long"
                }
              }
            }
          }
        }
      }
    }
  }
}

Solution

  • Mapping:

    {
      "index4" : {
        "mappings" : {
          "properties" : {
            "id" : {
              "type" : "integer"
            },
            "rules" : {
              "type" : "nested",
              "properties" : {
                "id" : {
                  "type" : "integer"
                },
                "name" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword"
                    }
                  }
                },
                "ruleDetails" : {
                  "properties" : {
                    "id" : {
                      "type" : "long"
                    },
                    "requiredAnswerId" : {
                      "type" : "long"
                    }
                  }
                }
              }
            },
            "test" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword"
                }
              }
            }
          }
        }
      }
    }
    

    Query: This will need use of scripts which are not good from performance perspective. I am looping through all documents and checking if field is present is passed parameters

    {
      "query": {
        "nested": {
          "path": "rules",
          "query": {
            "script": {
              "script": {
                "source": "for(a in doc['rules.ruleDetails.requiredAnswerId']){if(!params.Ids.contains((int)a)) return false; }  return true;",
                "params": {
                  "Ids": [
                    1,
                    2,
                    3
                  ]
                }
              }
            }
          },
          "inner_hits": {}
        }
      }
    }
    

    Result:

      "hits" : [
          {
            "_index" : "index4",
            "_type" : "_doc",
            "_id" : "TxOpvnEBf42mOjxvvLQB",
            "_score" : 4.0,
            "_source" : {
              "id" : 1,
              "test" : "name",
              "rules" : [
                {
                  "id" : 2,
                  "name" : "rule1",
                  "ruleDetails" : [
                    {
                      "id" : 3,
                      "requiredAnswerId" : 1
                    },
                    {
                      "id" : 4,
                      "requiredAnswerId" : 2
                    },
                    {
                      "id" : 5,
                      "requiredAnswerId" : 3
                    }
                  ]
                },
                {
                  "id" : 3,
                  "name" : "rule3",
                  "ruleDetails" : [
                    {
                      "id" : 3,
                      "requiredAnswerId" : 1
                    },
                    {
                      "id" : 4,
                      "requiredAnswerId" : 2
                    }
                  ]
                }
              ]
            },
            "inner_hits" : {
              "rules" : {
                "hits" : {
                  "total" : {
                    "value" : 1,
                    "relation" : "eq"
                  },
                  "max_score" : 4.0,
                  "hits" : [
                    {
                      "_index" : "index4",
                      "_type" : "_doc",
                      "_id" : "TxOpvnEBf42mOjxvvLQB",
                      "_nested" : {
                        "field" : "rules",
                        "offset" : 0
                      },
                      "_score" : 4.0,
                      "_source" : {
                        "id" : 2,
                        "name" : "rule1",
                        "ruleDetails" : [
                          {
                            "id" : 3,
                            "requiredAnswerId" : 1
                          },
                          {
                            "id" : 4,
                            "requiredAnswerId" : 2
                          },
                          {
                            "id" : 5,
                            "requiredAnswerId" : 3
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            }
          }
        ]
    

    EDIT 1

    Terms_set can be used as an alternative. It will be faster compared to script query

    Returns documents that contain a minimum number of exact terms in a provided field.

    minimum_should_match_script- size of array can be used to match the minimum number of passed values.

    Query:

    {
      "query": {
        "nested": {
          "path": "rules",
          "query": {
            "bool": {
              "filter": {
                "terms_set": {
                  "rules.ruleDetails.requiredAnswerId": {
                    "terms": [
                      1,
                      2,
                      3
                    ],
                    "minimum_should_match_script": {
                      "source": "doc['rules.ruleDetails.requiredAnswerId'].size()"
                    }
                  }
                }
              }
            }
          },
          "inner_hits": {}
        }
      }
    }