I've got the following document in an elastic search index:
{
"type": "foo",
"components": [{
"id": "1234123", ,
"data_collections": [{
"date_time": "2020-03-02T08:14:48+00:00",
"group": "1",
"group_description": "group1",
"measures": [{
"measure_name": "MEASURE_1",
"actual": "23.34"
}, {
"measure_name": "MEASURE_2",
"actual": "5"
}, {
"measure_name": "MEASURE_3",
"actual": "string_message"
}, {
"measure_name": "MEASURE_4",
"actual": "another_string"
}
]
},
{
"date_time": "2020-03-03T08:14:48+00:00",
"group": "2",
"group_description": "group2",
"measures": [{
"measure_name": "MEASURE_1",
"actual": "23.34"
}, {
"measure_name": "MEASURE_4",
"actual": "foo"
}, {
"measure_name": "MEASURE_5",
"actual": "bar"
}, {
"measure_name": "MEASURE_6",
"actual": "4"
}
]
}
]
}
]
}
Now I'm trying to figure out a mapping and a query for this document so the result would only contain the groups and measure_names I am interesed in. So far I'm able to query but I'll always retrieve the whole document which is not feasible since the array of measures can be quite large and most of the time I'd like a small subset.
For example I'm search for documents with "group": "1"
and "measure_name": "MEASURE_"
and the result I'd like to achieve looks like this:
{
"_id": "oiqwueou8931283u12",
"_source": {
"type": "foo",
"components": [{
"id": "1234123", ,
"data_collections": [{
"date_time": "2020-03-02T08:14:48+00:00",
"group": "1",
"group_description": "group1",
"measures": [{
"measure_name": "MEASURE_1",
"actual": "23.34"
}
]
}
]
}
]
}
}
I think what comes close to what I am looking for is the source
parameter, but as far as I know there is no way to filter for values like {"measure_name": {"value": "MEASURE_1"}}
Thanks.
The simplest mapping that comes to mind is
PUT timo
{
"mappings": {
"properties": {
"components": {
"type": "nested",
"properties": {
"data_collections": {
"type": "nested",
"properties": {
"measures": {
"type": "nested"
}
}
}
}
}
}
}
}
and the search query would be
GET timo/_search
{
"_source": ["inner_hits", "type", "components.id"],
"query": {
"bool": {
"must": [
{
"nested": {
"path": "components.data_collections",
"query": {
"term": {
"components.data_collections.group.keyword": {
"value": "1"
}
}
},
"inner_hits": {}
}
},
{
"nested": {
"path": "components.data_collections.measures",
"query": {
"term": {
"components.data_collections.measures.measure_name.keyword": {
"value": "MEASURE_1"
}
}
},
"inner_hits": {}
}
}
]
}
}
}
Notice the inner_hits
param under each subquery and that the _source
param is limited so that we don't return the whole hit, but rather only the subgroups that did match. type
and component.id
cannot be "seen" in the nested fields so we've included them explicitly.
The response should then look like this:
You now have precisely the attributes you need so a bit of post-processing will get you the desired format!
I'm not familiar w/ a cleaner way of doing this but if any of y'all do, I'd be glad to learn it.