Search code examples
elasticsearchelasticsearch-aggregation

Range ElasticSearch Aggregation


I need to compute a pipeline aggregation in ElasticSearch and I can't figure out how to express it.

Each document has an email address and an amount. I need to output range buckets of amount counts, grouped by unique email.

{ "0 - 99": 300, "100 - 400": 100 ...}

Would basically be the expected output (the keys would be transformed in my application code), indicating that 300 unique emails have cumulatively received at least 99 (amount) across all documents.

Intuitively, I would expect a query like below. However, range does not appear to be a buckets aggregation (or allow buckets_path).

What is the correct approach here?

{
 aggs: {
   users: {
     terms: {
       field: "email"
     },
     aggs: {
       amount_received: {
         sum: {
           field: "amount"
         }
       }
     }
   },
   amount_ranges: {
     range: {
       buckets_path: "users>amount_received",
       ranges: [
           { to: 99.0 },
           { from: 100.0, to: 299.0 },
           { from: 300.0, to: 599.0 },
           { from: 600.0 }
       ]
     }
   }
}
  }

Solution

  • There's no pipeline aggregation that does that directly. However, I think I came up with a solution that should suit your needs, it goes like this. The idea is to repeat the same terms/sum aggregation and then use a bucket_selector pipeline aggregation for each of the ranges you're interested in.

    POST index/_search
    {
      "size": 0,
      "aggs": {
        "users_99": {
          "terms": {
            "field": "email",
            "size": 1000
          },
          "aggs": {
            "amount_received": {
              "sum": {
                "field": "amount"
              }
            },
            "-99": {
              "bucket_selector": {
                "buckets_path": {
                  "amountReceived": "amount_received"
                },
                "script": "params.amountReceived < 100"
              }
            }
          }
        },
        "users_100_299": {
          "terms": {
            "field": "email",
            "size": 1000
          },
          "aggs": {
            "amount_received": {
              "sum": {
                "field": "amount"
              }
            },
            "100-299": {
              "bucket_selector": {
                "buckets_path": {
                  "amountReceived": "amount_received"
                },
                "script": "params.amountReceived >= 100 && params.amountReceived < 300"
              }
            }
          }
        },
        "users_300_599": {
          "terms": {
            "field": "email",
            "size": 1000
          },
          "aggs": {
            "amount_received": {
              "sum": {
                "field": "amount"
              }
            },
            "300-599": {
              "bucket_selector": {
                "buckets_path": {
                  "amountReceived": "amount_received"
                },
                "script": "params.amountReceived >= 300 && params.amountReceived < 600"
              }
            }
          }
        },
        "users_600": {
          "terms": {
            "field": "email",
            "size": 1000
          },
          "aggs": {
            "amount_received": {
              "sum": {
                "field": "amount"
              }
            },
            "600": {
              "bucket_selector": {
                "buckets_path": {
                  "amountReceived": "amount_received"
                },
                "script": "params.amountReceived >= 600"
              }
            }
          }
        }
      }
    }
    

    In the results, the number of buckets in the users_99 will be the number of unique emails that have an amount less than 99. Similarly, users_100_299 will contain as many buckets as there are unique emails with amounts between 100 and 300. And so on...