Search code examples
elasticsearchelastic-stackaggregation

How to sort before aggregation in Elasticsearch?


I have an Elasticsearch index structured like this

{
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "fields":{
                    "keyword":{
                        "type":"keyword",
                        "ignore_above":20
                    }
                }
            },
            "result_nums":{
                "type":"integer"
            }
        }
    }
}

and all documents in the index like this

{
  "content": "this",
  "result_nums": 40
},
{
  "content": "this",
  "result_nums": 40
},
{
  "content": "that",
  "result_nums": 40
},
{
  "content": "what",
  "result_nums": 50
},
{
  "content": "what",
  "result_nums": 50
},
{
  "content": "but",
  "result_nums": 100
},
{
  "content": "like",
  "result_nums": 20
}

I need to get the data, sorting by result_nums DESC and removing duplicate "content". For example, I used the query like this to get the first two data

{
    "size": 0,
    "aggs": {
        "content": {
            "terms": {
                "field": "content.keyword",
                "size": 2
            },
            "aggs": {
                "res_nums": {
                    "avg": {
                        "field": "result_nums"
                    }
                },
                "res_sort": {
                    "bucket_sort": {
                        "sort": [
                            {
                                "res_nums": "desc"
                            }
                        ]
                    }
                }
            }
        }
    }
}

The data I expect to get is

                {
                    "key": "but",
                    "doc_count": 1,
                    "res_nums": {
                        "value": 100.0
                    }
                },
                {
                    "key": "what",
                    "doc_count": 2,
                    "res_nums": {
                        "value": 50.0
                    }
                }

But what I actually get is

                {
                    "key": "what",
                    "doc_count": 2,
                    "res_nums": {
                        "value": 50.0
                    }
                },
                {
                    "key": "this",
                    "doc_count": 2,
                    "res_nums": {
                        "value": 40.0
                    }
                }

so I think es needs to be sorted before aggregation, because now it will only be sorted after aggregation, so I got results that did not match expectations.

I tried to use sort before aggregation but no effect

{
"size": 0,
    "sort": [
        {
            "result_nums": "desc"
        }
    ],
    "aggs": {
    ...
    }
...
}

So how to do sort before aggregation?


Solution

  • You need to use max aggregation along with term query to get the data, sorting by result_nums DESC and removing duplicate "content"

    Adding a working example

    Search Query:

    {
      "size": 0,
      "aggs": {
        "content": {
          "terms": {
            "field": "content.keyword",
            "order": {
              "max_num": "desc"
            },
            "size":2
          },
          "aggs": {
            "max_num": {
              "max": {
                "field": "result_nums"
              }
            }
          }
        }
      }
    }
    

    Search Result:

    "aggregations": {
        "content": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 4,
          "buckets": [
            {
              "key": "but",
              "doc_count": 1,
              "max_num": {
                "value": 100.0
              }
            },
            {
              "key": "what",
              "doc_count": 2,
              "max_num": {
                "value": 50.0
              }
            }
          ]
        }