Search code examples
pythonelasticsearchkibanasamplingelasticsearch-aggregation

Elastic Search Sampling Aggregation UNKNOWN KEY


Currently working with Kibana with 6 Billion + documents and trying to get a sampling based on the 'index' which is the particular day the sample was collected.

from elasticsearch import Elasticsearch
es = Elasticsearch(['https://user:secret@localhost:xxx'])

Using the code below to query:

res = es.search(body=body1)
print(f"Got {res['hits']['total']} Hits:")

When I use the body below, I get all 6 billion documents:

body1 = {
            "query": {"match_all": {}}
        }

However, when I set up an aggregation pipeline, I get the error RequestError(400, 'parsing exception', 'Unknown key for a START_OBJECT in [my_agg].')

body0 = {
            "query": {"match_all": {}},
            "size": 0,
            "aggs": {
                "my_unbiased_sample": {
                    "diversified_sampler": {
                        "max_docs_per_value" : 3, 
                        "field" : "_index"
                    }
                }
            }, "my_agg": {
                "terms": {
                    "field": "_index"
                }
            }
}

I believe that my problem lies with my second aggregator and not my first diversified sampler. I just want the output from the diversified sampler, but I am being forced to have a second aggregator.


Solution

  • You were almost there -- just gotta fix the nested-ness:

    {
      "query": {
        "match_all": {}
      },
      "size": 0,
      "aggs": {
        "my_unbiased_sample": {
          "diversified_sampler": {
            "max_docs_per_value": 3,
            "field": "_index"
          },
          "aggs": {
            "my_agg": {
              "terms": {
                "field": "_index"
              }
            }
          }
        }
      }
    }