Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-api

Elasticsearch combining multiple buckets and aggregations


Let's assume we're looking at data that's reasonably simple -- each document in our index has this structure:

{
    "Time": "2018-01-01T19:35:00.0000000Z",
    "Country": "Germany",
    "Addr": "security.web.com",
    "FailureCount": 5,
    "SuccessCount": 50
}

My question essentially boils down to how I might go about doing something like this: https://www.elastic.co/guide/en/elasticsearch/guide/current/_combining_the_two.html. Specifically, I am trying to perform the same aggregation (query below) on all combinations of Country and Addr. My current query attempt is below. I aggregate across a 5-minute grain (that is part of my requirements), and so far I have only been able to aggregate based on one query.

{
"size":0,
"query":{
   "bool":{
      "filter":[
         {
            "range":{
               "Time":{
                  "gte":"1514835300000",
                  "lte":"1514835600000",
                  "format":"epoch_millis"
               }
            }
         },
         {
            "query_string":{
               "analyze_wildcard":true,
               "query":"Country:Germany"
            }
         }
      ]
   }
},
"aggs":{
   "2":{
      "date_histogram":{
         "interval":"5m",
         "field":"Time",
         "min_doc_count":0,
         "extended_bounds":{
            "min":"1514835300000",
            "max":"1514835600000"
         },
         "format":"epoch_millis"
      },
      "aggs":{
         "4":{
            "bucket_script":{
               "buckets_path":{
                  "success":"9",
                  "failure":"10"
               },
               "script":"( params.success + params.failure )"
            }
         },
         "9":{
            "sum":{
               "field":"SuccessCount"
            }
         }
         "10":{
            "sum":{
               "field":"FailureCount"
            }
         }
      }
   }
}

This works, but simply aggregates on all documents that match the bool-filter (over 5-minute buckets). Instead, I wanted to aggregate across all combinations of Country and Addr (over 5-minute buckets).

That is, I would like an aggregation result/metric (as laid out in the script in bucket 4) for all docs that have "Country": "Germany" and "Addr": "security.web.com", one for all docs that have "Country": "United States" and "Addr": "security.web.com", and so on, for all Addrs and all Countrys. Is this possible in one Elasticsearch request? What might my best option be here?

Follow-up

Is this also possible to do not across all combinations of Addrs and Countrys, but instead across specific combinations of Addrs and Countrys (that I might lay out in a query)? Or am I overreaching beyond ES's capabilities within one request?

Thanks!


Solution

  • If you want this in 1 query, you may just try sub aggregating it 4 times.

    "aggs": {
        "countries": {
            "terms": {
                "field": country,
                "size": 300
             },
             "aggs": {
                 "addrs": {
                    "terms": {
                        "field": "Addr",
                         "size": 1000
                     },
                     "aggs": {
                         "2": {
                            "date_histogram":.....// your original query
                     }
                  }
              }
         }
     }
    

    However, I would not recommend doing this on a large amount of data as such deep sub-aggregations would be really slow. If you really need to do this in a single query, create a field which combines country and addr in a single field while indexing and aggregate on it.

    If you want specific combinations, just put your combinations inside a filters aggregation and sub-aggregate it with your query.