Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dslelasticsearch-painlesselasticsearch-scripting

Aggregations with dynamic data / nested_objects


I'm trying to aggregate over dynamically mapped fields in ElasticSearch.

For example:

POST test/_doc/1
{
    "settings": {
        "range": {
            "value": 200,
            "display": "200 km"
        },
        "transmitter": {
            "value": 1.2,
            "display": "1.2 Ghz"
        }
    }
}

The properties under settings are dynamic. Essentially I need a query like this:

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "settings": {
            "terms": {
                "field": "settings.*.display"
            }
        }
    }
}

Since * doesn't work here, I'm wondering if there's a way to return the fields from a painless script and then maybe use a pipeline aggregation? I can't find the painless equivalent to Object.keys(settings) in JavaScript.

I've seen an approach with nested objects, but I'd like to avoid that, as there might be many 'settings' properties and the default limit is 50, compared to nested_objects with 10000 properties.


Solution

  • The painless equivalent of Object.keys() is .keySet(). You can implement the following iterative logic in a scripted metric agg:

    GET test/_search
    {
      "size": 0,
      "aggs": {
        "dynamic_fields_agg": {
          "scripted_metric": {
            "init_script": "state.map = [:];",
            "map_script": """
              def source = params._source['settings'];
                for (def key : source.keySet()) {
                  if (source[key].containsKey("display")) {
                     if (state.map.containsKey(key)) { 
                      state.map[key].add(source[key].display);
                     } else {
                       state.map[key] = [source[key].display];
                     }
                  }
                }
            """,
            "combine_script": "return state",
            "reduce_script": "return states"
          }
        }
      }
    }
    

    which will yield something like

    {
      "aggregations":{
        "dynamic_fields_agg":{
          "value":[
            {
              "map":{
                "range":[
                  "200 km"
                ],
                "transmitter":[
                  "1.2 Ghz"
                ]
              }
            }
          ]
        }
      }
    }
    

    Now you can post-process the values in the reduce/combine scripts however you like.


    Using nested fields would not bring you much advantage here -- wildcard paths are not allowed there either. I asked that myself some time ago.


    UPDATE -- the inline version:

    GET /test/_search
    {  "size": 0,  "aggs": {    "dynamic_fields_agg": {      "scripted_metric": {        "init_script": "state.map = [:];",        "map_script": "          def source = params._source[\"settings\"];\n            for (def key : source.keySet()) {\n              if (source[key].containsKey(\"display\")) {\n                 if (state.map.containsKey(key)) { \n                  state.map[key].add(source[key].display);\n                 } else {\n                   state.map[key] = [source[key].display];\n                 }\n              }\n            }",        "combine_script": "return state",        "reduce_script": "return states"      }    }  }}