Search code examples
elasticsearchelasticsearch-queryelasticsearch-nested

Return only elements of an array in an object that contain a certain value


I've got the following document in an elastic search index:

{
    "type": "foo",
    "components": [{
            "id": "1234123", ,
            "data_collections": [{
                    "date_time": "2020-03-02T08:14:48+00:00",
                    "group": "1",
                    "group_description": "group1",
                    "measures": [{
                            "measure_name": "MEASURE_1",
                            "actual": "23.34"
                        }, {
                            "measure_name": "MEASURE_2",
                            "actual": "5"
                        }, {
                            "measure_name": "MEASURE_3",
                            "actual": "string_message"
                        }, {
                            "measure_name": "MEASURE_4",
                            "actual": "another_string"
                        }
                    ]
                },
                {
                    "date_time": "2020-03-03T08:14:48+00:00",
                    "group": "2",
                    "group_description": "group2",
                    "measures": [{
                            "measure_name": "MEASURE_1",
                            "actual": "23.34"
                        }, {
                            "measure_name": "MEASURE_4",
                            "actual": "foo"
                        }, {
                            "measure_name": "MEASURE_5",
                            "actual": "bar"
                        }, {
                            "measure_name": "MEASURE_6",
                            "actual": "4"
                        }
                    ]
                }
            ]
        }
    ]
}

Now I'm trying to figure out a mapping and a query for this document so the result would only contain the groups and measure_names I am interesed in. So far I'm able to query but I'll always retrieve the whole document which is not feasible since the array of measures can be quite large and most of the time I'd like a small subset.

For example I'm search for documents with "group": "1" and "measure_name": "MEASURE_" and the result I'd like to achieve looks like this:

{
    "_id": "oiqwueou8931283u12",
    "_source": {
        "type": "foo",
        "components": [{
                "id": "1234123", ,
                "data_collections": [{
                        "date_time": "2020-03-02T08:14:48+00:00",
                        "group": "1",
                        "group_description": "group1",
                        "measures": [{
                                "measure_name": "MEASURE_1",
                                "actual": "23.34"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

I think what comes close to what I am looking for is the source parameter, but as far as I know there is no way to filter for values like {"measure_name": {"value": "MEASURE_1"}}

Thanks.


Solution

  • The simplest mapping that comes to mind is

    PUT timo
    {
      "mappings": {
        "properties": {
          "components": {
            "type": "nested",
            "properties": {
              "data_collections": {
                "type": "nested",
                "properties": {
                  "measures": {
                    "type": "nested"
                  }
                }
              }
            }
          }
        }
      }
    }
    

    and the search query would be

    GET timo/_search
    {
      "_source": ["inner_hits", "type", "components.id"], 
      "query": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "components.data_collections",
                "query": {
                  "term": {
                    "components.data_collections.group.keyword": {
                      "value": "1"
                    }
                  }
                },
                "inner_hits": {}
              }
            },
            {
              "nested": {
                "path": "components.data_collections.measures",
                "query": {
                  "term": {
                    "components.data_collections.measures.measure_name.keyword": {
                      "value": "MEASURE_1"
                    }
                  }
                },
                "inner_hits": {}
              }
            }
          ]
        }
      }
    }
    

    Notice the inner_hits param under each subquery and that the _source param is limited so that we don't return the whole hit, but rather only the subgroups that did match. type and component.id cannot be "seen" in the nested fields so we've included them explicitly.

    The response should then look like this: enter image description here

    You now have precisely the attributes you need so a bit of post-processing will get you the desired format!


    I'm not familiar w/ a cleaner way of doing this but if any of y'all do, I'd be glad to learn it.