Search code examples
elasticsearchelastic-stackelasticsearch-aggregation

how to get aggregate aggregations by indices in a multiple index search?


I have an aggregation query which works for a single index, the aggs looks like :

"aggs":{  
    "my_buckets":{  
      "composite":{  
        "size":1000,
        "sources":[  
          {  
            "checksumField":{  
              "terms":{  
               "field":"checkSum.keyword"
              }
            }
          }
        ]
      },
      "aggs":{  
        "catagories":{  
          "top_hits":{  
            "sort":[  
              {  
                "createdDate":{  
                  "order":"desc"
                 }
              }
            ],
            "size":1,
            "_source":[  
             "some_field"
            ]
          }
        }
      }
    }
  }

this works as needed for a single index, but when I include multiple indices as comma separated values in the GET uri , if the first index itself has many entries(say 1000) I am not able to see the results from the other indices as max size of my aggregation result is set to 1000, however what I need is top hits from all indices( say top 500 from each index if there are two indices) , how do I modify the aggs body to get that kind of aggregations result


Solution

  • Got the solution to the problem, the following is the aggs part which returns the composite buckets by indices

    GET index1,index2,index3/type/_search
    
     "aggs": {
        "my_buckets": {
          "composite": {
            "size": 3,
            "sources": [
              {
                "indexAgg": {
                  "terms": {
                    "field": "_index"
                  }
                }
              }
            ]
          },
          "aggs": {
            "checksumField": {
              "terms": {
                "field": "checkSum.keyword",
                "size":2
              },
              "aggs": {
                "catagories": {
                  "top_hits": {
                    "sort": [
                      {
                        "createdDate": {
                          "order": "desc"
                        }
                      }
                    ],
                    "size": 1,
                    "_source": [
                      "some_field"
                    ]
                  }
                }
              }
            }
          }
        }
      }
    

    the resulting aggregations produces three main buckets(for three templates)and inside each 2 (this is the size i need to calculate based on the number of templates provided, by evenly dividing by 1000) aggregations based on the checksum field, as returned by the original query in the question. So with these changes, I am able to get a fixed counts of hits per index.