I need to compute a pipeline aggregation in ElasticSearch and I can't figure out how to express it.
Each document has an email address and an amount. I need to output range buckets of amount counts, grouped by unique email.
{ "0 - 99": 300, "100 - 400": 100 ...}
Would basically be the expected output (the keys would be transformed in my application code), indicating that 300 unique emails have cumulatively received at least 99 (amount) across all documents.
Intuitively, I would expect a query like below. However, range does not appear to be a buckets aggregation (or allow buckets_path).
What is the correct approach here?
{
aggs: {
users: {
terms: {
field: "email"
},
aggs: {
amount_received: {
sum: {
field: "amount"
}
}
}
},
amount_ranges: {
range: {
buckets_path: "users>amount_received",
ranges: [
{ to: 99.0 },
{ from: 100.0, to: 299.0 },
{ from: 300.0, to: 599.0 },
{ from: 600.0 }
]
}
}
}
}
There's no pipeline aggregation that does that directly. However, I think I came up with a solution that should suit your needs, it goes like this. The idea is to repeat the same terms/sum
aggregation and then use a bucket_selector
pipeline aggregation for each of the ranges you're interested in.
POST index/_search
{
"size": 0,
"aggs": {
"users_99": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"amount_received": {
"sum": {
"field": "amount"
}
},
"-99": {
"bucket_selector": {
"buckets_path": {
"amountReceived": "amount_received"
},
"script": "params.amountReceived < 100"
}
}
}
},
"users_100_299": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"amount_received": {
"sum": {
"field": "amount"
}
},
"100-299": {
"bucket_selector": {
"buckets_path": {
"amountReceived": "amount_received"
},
"script": "params.amountReceived >= 100 && params.amountReceived < 300"
}
}
}
},
"users_300_599": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"amount_received": {
"sum": {
"field": "amount"
}
},
"300-599": {
"bucket_selector": {
"buckets_path": {
"amountReceived": "amount_received"
},
"script": "params.amountReceived >= 300 && params.amountReceived < 600"
}
}
}
},
"users_600": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"amount_received": {
"sum": {
"field": "amount"
}
},
"600": {
"bucket_selector": {
"buckets_path": {
"amountReceived": "amount_received"
},
"script": "params.amountReceived >= 600"
}
}
}
}
}
}
In the results, the number of buckets in the users_99
will be the number of unique emails that have an amount less than 99. Similarly, users_100_299
will contain as many buckets as there are unique emails with amounts between 100 and 300. And so on...