python elasticsearch kibana sampling elasticsearch-aggregation

Elastic Search Sampling Aggregation UNKNOWN KEY

Currently working with Kibana with 6 Billion + documents and trying to get a sampling based on the 'index' which is the particular day the sample was collected.

from elasticsearch import Elasticsearch
es = Elasticsearch(['https://user:secret@localhost:xxx'])

Using the code below to query:

res = es.search(body=body1)
print(f"Got {res['hits']['total']} Hits:")

When I use the body below, I get all 6 billion documents:

body1 = {
            "query": {"match_all": {}}
        }

However, when I set up an aggregation pipeline, I get the error RequestError(400, 'parsing exception', 'Unknown key for a START_OBJECT in [my_agg].')

body0 = {
            "query": {"match_all": {}},
            "size": 0,
            "aggs": {
                "my_unbiased_sample": {
                    "diversified_sampler": {
                        "max_docs_per_value" : 3, 
                        "field" : "_index"
                    }
                }
            }, "my_agg": {
                "terms": {
                    "field": "_index"
                }
            }
}

I believe that my problem lies with my second aggregator and not my first diversified sampler. I just want the output from the diversified sampler, but I am being forced to have a second aggregator.

Solution

You were almost there -- just gotta fix the nested-ness:

{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "my_unbiased_sample": {
      "diversified_sampler": {
        "max_docs_per_value": 3,
        "field": "_index"
      },
      "aggs": {
        "my_agg": {
          "terms": {
            "field": "_index"
          }
        }
      }
    }
  }
}