Search code examples
datetimeelasticsearchtimezoneelasticsearch-aggregationelasticsearch-dsl

Elasticsearch aggregation different timezone


I'm trying to see a histogram divided by hour of a day.

The problem is the timezone: if I put a document at 8am in America and at 8am in Europe I have different time zone but in the graph I want aggregate them at the same hour.

So my result should be 2 doc count at 8am

My actual data:

[
  {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      "my_time" : "2021-09-17T08:00:00.000-04:00",
      "value" : 1
    }
  },
  {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 1.0,
    "_source" : {
      "my_time" : "2021-09-17T08:00:00.000+02:00",
      "value" : 2
    }
  },
  {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "3",
    "_score" : 1.0,
    "_source" : {
      "my_time" : "2021-09-17T03:00:00.000-04:00",
      "value" : 3
    }
  },
  {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 1.0,
    "_source" : {
      "my_time" : "2021-09-17T04:00:00.000-08:00",
      "value" : 4
    }
  },
  {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "5",
    "_score" : 1.0,
    "_source" : {
      "my_time" : "2021-09-17T23:00:00.000+04:00",
      "value" : 5
    }
  }
]

In the mapping

"my_time": {
    "type": "date"
 }

My query:

POST /test_index/_search?size=0
{
  "aggs": {
    "my_results": {
      "date_histogram": {
        "field": "my_time",
        "calendar_interval": "1h"
      }
    }
  }
}

but the results are worng!! For example:

"buckets" : [
    {
      "key_as_string" : "2021-09-17T06:00:00.000Z",
      "key" : 1631858400000,
      "doc_count" : 1
    },
    {
      "key_as_string" : "2021-09-17T07:00:00.000Z",
      "key" : 1631862000000,
      "doc_count" : 1
    },
    ......
]

How can I fix it?


Solution

  • I'm struggling with the same problem.

    For what I've seen, the easiest solution seems to create 2 different date-time fields, one to store the UTC date-time and another one with to store the local-time (as "fake" UTC date-time). The you can use the local-time field for the aggregation by hour of the day and the UTC field for the standard search query.

    But, to be honest, looks to me more like a workaround than a proper solution...