Search code examples
elasticsearchaggregationpercentile

elastic search percentile of aggregation


I have an index with 3 fields: user_id, count, timestamp.

I would like to be able to aggregate count by user_id, which is easy with elastic search, however, what I also want to be able to do is a percentile rank on this resulting data.

Is this possible?


Solution

  • Yes, it may be achieved using Pipelined Percentiles Bucket Aggregation which is a sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.

    A percentiles_bucket aggregation looks like this in isolation:

    {
        "percentiles_bucket": {
            "buckets_path": "the_sum"
        }
    }
    

    The following snippet calculates the sum of all the total monthly sales buckets:

    {
        "aggs" : {
            "sales_per_month" : {
                "date_histogram" : {
                    "field" : "date",
                    "interval" : "month"
                },
                "aggs": {
                    "sales": {
                        "sum": {
                            "field": "price"
                        }
                    }
                }
            },
            "sum_monthly_sales": {
                "percentiles_bucket": {
                    "buckets_paths": "sales_per_month>sales", 
                    "percents": [ 25.0, 50.0, 75.0 ] 
                }
            }
        }
    }
    

    And the following may be the response:

    {
       "aggregations": {
          "sales_per_month": {
             "buckets": [
                {
                   "key_as_string": "2015/01/01 00:00:00",
                   "key": 1420070400000,
                   "doc_count": 3,
                   "total_sales": {
                       "value": 50
                   },
                   "t-shirts": {
                       "doc_count": 2,
                       "sales": {
                           "value": 10
                       }
                   },
                   "t-shirt-percentage": {
                       "value": 20
                   }
                },
                {
                   "key_as_string": "2015/02/01 00:00:00",
                   "key": 1422748800000,
                   "doc_count": 2
                   "total_sales": {
                       "value": 60
                   },
                   "t-shirts": {
                       "doc_count": 1,
                       "sales": {
                           "value": 15
                       }
                   },
                   "t-shirt-percentage": {
                       "value": 25
                   }
                }
             ]
          }
       }
    }