Search code examples
elasticsearchaggregationelasticsearch-6

Elasticsearch Terms Aggregation - for dynamic keys of an object


Documents Structure

Doc_1 {
"title":"hello",
"myObject":{
 "key1":"value1",
 "key2":"value2"
 }
}
Doc_2 {
"title":"hello world",
"myObject":{
 "key2":"value4",
 "key3":"value3"
 }
}
Doc_3 {
"title":"hello world2",
"myObject":{
 "key1":"value1",
 "key3":"value3"
 }
}

Information: myObject contains dynamic key-value pair.

Objective: My objective is to write an aggregation query to return the number of unique all dynamic key-value pairs.

Attempt and explanation: I can easily get results for known keys in this way.

       {
        "size":0,
        "query":{
               "match":{"title":"hello"}
                },
        "aggs":{
               "key1Agg":{
                    "terms":{"field":"myObject.key1.keyword"}
                },
               "key2Agg":{
                    "terms":{"field":"myObject.key2.keyword"}
                },
               "key3Agg":{
                    "terms":{"field":"myObject.key3.keyword"}
               }
              }
          }

This is the typical result of the above hardcoded nested keys aggregation.

{
...
"aggregations": {
    "key1Agg": {
        ...        
        "buckets": [
            {
                "key": "value1",
                "doc_count": 2
            }

        ]
    },
    "key2Agg": {
        ...
        "buckets": [
            {
                "key": "value2",
                "doc_count": 1
            },
            {
                "key": "value4",
                "doc_count": 1
            }

        ]
    },
    "key3Agg": {
       ...
        "buckets": [
            {
                "key": "value3",
                "doc_count": 2
            }

        ]
    }
}

}

Now all I want is to return the count of all dynamic key-value pairs, i.e without putting any hardcore key names in an aggregation query.

I am using ES 6.3, Thanks in Advance!!


Solution

  • From the information you have provided, it appears that myObject seems to be of object datatype and not nested datatype.

    Well, there is no easy way to do without modifying the data you have, what you can do and possibly the simplest solution would be is to include an additional field say let's call it as myObject_list which would be of type keyword where the documents would be as follows:

    Sample Documents:

    POST test_index/_doc/1
    {
     "title":"hello",
      "myObject":{
       "key1":"value1",
       "key2":"value2"
      },
      "myObject_list": ["key1_value1", "key2_value2"]     <--- Note this
    }
    
    POST test_index/_doc/2
    {
     "title":"hello world",
      "myObject":{
       "key2":"value4",
       "key3":"value3"
      },
      "myObject_list": ["key2_value4", "key3_value3"]     <--- Note this
    }
    
    POST test_index/_doc/3
    {
     "title":"hello world2",
      "myObject":{
       "key1":"value1",
       "key3":"value3"
      },
      "myObject_list": ["key1_value1", "key3_value3"]     <--- Note this
    }
    

    You can have a query as simple as below:

    Request Query:

    POST test_index/_search
    {
      "size": 0,
      "aggs": {
        "key_value_aggregation": {
          "terms": {
            "field": "myObject_list",              <--- Make sure this is of keyword type
            "size": 10
          }
        }
      }
    }
    

    Note that I've used Terms Aggregation over here.

    Response:

    {
      "took" : 406,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "key_value_aggregation" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "key1_value1",
              "doc_count" : 2
            },
            {
              "key" : "key3_value3",
              "doc_count" : 2
            },
            {
              "key" : "key2_value2",
              "doc_count" : 1
            },
            {
              "key" : "key2_value4",
              "doc_count" : 1
            }
          ]
        }
      }
    }
    

    Hope this helps!