I am using elastic search to count data, but I run into an issue that the result of using date histogram and the the result of using date range are different from each other.
Here is my query and its result using date histogram and group by date to count data.
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"range": {
"starttime": {
"from": 1686502800000,
"to": null,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"range": {
"starttime": {
"from": null,
"to": 1688317199999,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "holdCall",
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"group_by_starttime": {
"date_histogram": {
"field": "starttime",
"calendar_interval": "1d",
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
},
"aggregations": {
"count_id": {
"value_count": {
"field": "id.keyword"
}
}
}
}
}
}
{
"group_by_starttime": {
"buckets": [
{
"key": 1687564800000,
"doc_count": 4771,
"count_id": {
"value": 4771
}
},
{
"key": 1687651200000,
"doc_count": 25032,
"count_id": {
"value": 25032
}
}
]
}
}
As you can see in 25/06/2023 (1687651200000), the result is 25032.
Here is my query its result using date range to count data in 25/06/2023.
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"range": {
"starttime": {
"from": 1687626000000,
"to": null,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"range": {
"starttime": {
"from": null,
"to": 1687712399999,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "holdCall",
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"count_id": {
"value_count": {
"field": "id.keyword"
}
}
}
}
{
"count_id": {
"value": 29803
}
}
As you can see the result is 29803, and it is different from the result using date histogram.
How can I solve this issue, please? I did scour the Internet, but there is not anything found.
The discrepancy in the total document count arises from the difference in the time range specified. The "from" time is set to 1687626000000, which corresponds to "24 June 2023 17:00:00" in GMT, while the "end" time is set to "25 June 2023 16:59:59.999" in GMT.
When using the date histogram aggregation, the buckets are created on a daily basis due to the calendar_interval being set to 1 day. As a result, two buckets are created: one for 24th June and another for 25th June.
The total count of 4771 represents the documents within the bucket for 24th June, while the count of 25032 represents the documents within the bucket for 25th June. Adding these values together yields a total count of 29803.
On the other hand, when running a range query without the date histogram aggregation, the query retrieves all the data within the given time range as a whole, without creating separate buckets for each day. Consequently, the resulting count will be different from the aggregation count, as it does not consider the daily breakdown of data.