Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dslelasticsearch-query

Elastic search Average time difference Aggregate Query


I have documents in elasticsearch in which each document looks something like as follows:

{
  "id": "T12890ADSA12",
  "status": "ENDED",
  "type": "SAMPLE",
  "updatedAt": "2020-05-29T18:18:08.483Z",
  "events": [
    {
      "event": "STARTED",
      "version": 1,
      "timestamp": "2020-04-30T13:41:25.862Z"
    },
    {
      "event": "INPROGRESS",
      "version": 2,
      "timestamp": "2020-05-14T17:03:09.137Z"
    },
    {
      "event": "INPROGRESS",
      "version": 3,
      "timestamp": "2020-05-17T17:03:09.137Z"
    },
    {
      "event": "ENDED",
      "version": 4,
      "timestamp": "2020-05-29T18:18:08.483Z"
    }
  ],
  "createdAt": "2020-04-30T13:41:25.862Z"
}

Now, I wanted to write a query in elasticsearch to get all the documents which are of type "SAMPLE" and I can get the average time between STARTED and ENDED of all those documents. Eg. Avg of (2020-05-29T18:18:08.483Z - 2020-04-30T13:41:25.862Z, ....). Assume that STARTED and ENDED event is present only once in events array. Is there any way I can do that?


Solution

  • You can do something like this. The query selects the events of type SAMPLE and status ENDED (to make sure there is a ENDED event). Then the avg aggregation uses scripting to gather the STARTED and ENDED timestamps and subtracts them to return the number of days:

    POST test/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "term": {
                "status.keyword": "ENDED"
              }
            },
            {
              "term": {
                "type.keyword": "SAMPLE"
              }
            }
          ]
        }
      },
      "aggs": {
        "duration": {
          "avg": {
            "script": "Map findEvent(List events, String type) {return events.find(it -> it.event == type);} def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp); def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp); return ChronoUnit.DAYS.between(started, ended);"
          }
        }
      }
    }
    

    The script looks like this:

    Map findEvent(List events, String type) {
      return events.find(it -> it.event == type);
    }
    def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp);
    def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp); 
    return ChronoUnit.DAYS.between(started, ended);