Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-date

Convert timestamps to datetime for use in Elasticsearch aggregations


I have an index of SendGrid event data:

"_source": {
    "externalId": "9283cc1d-b003-xxxx-a5af-84fcf31c4181",
    "email": "[email protected]",
    "timestamp": 1616515214,
    "event": "processed",
    "uid": null,
    "id": null,
    "sendgridEventId": null,
    "smtpId": null,
    "sgMessageId": null,
    "sgEventId": null,
    "sendgridEvent": null,
    "type": null,
    "category": [],
    "reason": null,
    "status": null,
    "url": null,
    "useragent": null,
    "ip": null,
    "response": null,
    "tls": null,
    "attempt": null,
    "sendAt": null,
    "asmGroupId": null
}

Now I like to aggregate all of these events for a given day using the timestamp attribute.

GET /sendgridevententity/_search
{
  "query":
   {
    "match_all": {}
   },
   "aggs": {
     "amount_per_day": {
       "date_histogram": {
         "field": "timestamp",
         "calendar_interval": "1d"
       }
     }
   }
}  

Unfortunately, this just yields all the single events as they all have a different timestamp and the aggregation does not group them by day.

How can I convert the timestamps to date and then run the aggregation?


Solution

  • You can take advantage of a multi field mapping. Here's how it works.

    1. Update the existing mapping with a new date "sub-field". I'm assuming timestamp was originally mapped as a long. I'm also assuming the timestamps are in epoch seconds, thereby the explicitly set format:
    POST sendgridevententity/_mapping
    {
      "properties": {
        "timestamp": {
          "type": "long",
          "fields": {
            "as_date": {
              "type": "date",
              "format": "epoch_second"
            }
          }
        }
      }
    }
    
    1. This new property now needs to be picked up and your data needs to be reindexed. You can trigger a reindex call via a cool little trick — sending an empty _update_by_query request:
    POST sendgridevententity/_update_by_query
    
    1. After the reindex operation finishes, you can target the new date field through the dot notation:
    GET /sendgridevententity/_search
    {
      "size": 0, 
      "query": {
        "match_all": {}
      },
      "aggs": {
        "amount_per_day": {
          "date_histogram": {
            "field": "timestamp.as_date",
            "format": "yyyy-MM-dd", 
            "calendar_interval": "1d"
          }
        }
      }
    }
    

    ⚠️ Depending your index size and many other factors, the _update_by_query request may appear to time out. It's possible to set wait_for_completion=false which'll trigger an asynchronous background task instead.

    💡 Note that I used size: 0 in the final request. It's a handy tool for returning only the aggregation results.