"eventTime": {
"type": "long"
},
POST some_indices/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": {
"eventTime": {
"from": 1563120000000,
"to": 1565712000000,
"format": "epoch_millis"
}
}
}
}
},
"aggs": {
"min_eventTime": { "min" : { "field": "eventTime"} },
"max_eventTime": { "max" : { "field": "eventTime"} },
"time_series": {
"histogram": {
"field": "eventTime",
"interval": 86400000,
"min_doc_count" : 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
}
}
}
}
"aggregations": {
"max_eventTime": {
"value": 1565539199997
},
"min_eventTime": {
"value": 1564934400000
},
"time_series": {
"buckets": [
{
"key": 1563062400000,
"doc_count": 0
},
{
"key": 1563148800000,
"doc_count": 0
},
{
...
As the reference clearly mentioned
For filtering buckets, one should nest the histogram aggregation under a range filter aggregation with the appropriate from/to settings.
I set the filter properly (as the demo does) and the min
and max
is also providing the evidence.
But why still the first key
is SMALLER THAN than the from (or min_eventTime)?
So weird and I totally get lost now ;(
Any advice will be appreciated ;)
I hacked out a solution for now, but I kind of think it's a bug in Elastic Search.
I am using date_histogram
instead though the field itself is a long type and via offset
I moved the starting point forward to the right timestamp.
"aggs": {
"time_series": {
"date_histogram": {
"field": "eventTime",
"interval": 86400000,
"offset": "+16h",
"min_doc_count": 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
},
"aggs": {
"order_amount_total": {
"sum": {
"field": "order_amount"
}
}
}
}
}
Thanks for the help of @Val, I re-think about it and have a test as follows:
@Test
public void testComputation() {
System.out.println(1563120000000L % 86400000L); // 57600000
System.out.println(1563062400000L % 86400000L); // 0
}
I want to quote from the doc
With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).
But I believe the specific min value
should be one of 0, interval, 2 * interval, 3 * interval, ....
instead of a random value as I used in the question.
So basically in my case, I could use offset
of histogram to solve the issue as follows.
I don't actually need date_histogram at all.
"histogram": {
"field": "eventTime",
"interval": 86400000,
"offset": 57600000,
"min_doc_count" : 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
}
A clear explanation posted by Elastic Search member @polyfractal (thank you for the detailed crystal explanation) is also proving the same logic, more details could be found here.
A reason for the design I want to quote here:
if we cut the aggregation off right at the extended_bounds.min/max, we would generate buckets that are not the full interval and that would break many assumptions about how the histogram works.