Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dslelasticsearch-7

Average of differences calculated between two date fields


I'm working on a project that uses Elasticsearch to store data and show some complex statistics.

I have an index in that looks like this:

Reservation {
  id: number
  check_in: Date
  check_out: Date
  created_at: Date
  // other fields...
}

I need to calculate the average days' difference between check_in and created_at of my Reservations in a specific date range and show the result as a number.

I tried this query:

{
  "script_fields": {
    "avgDates": {
      "script": {
        "lang": "expression",
        "source": "doc['created_at'].value - doc['check_in'].value"
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "created_at": {
              "gte": "{{lastMountTimestamp}}",
              "lte": "{{currentTimestamp}}"
            }
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "avgBetweenDates": {
      "avg": {
        "field": "avgDates"
      }
    }
  }
}

Dates fields are saved in ISO 8601 form (eg: 2020-03-11T14:25:15+00:00), I don't know if this could produce issues.

It catches some hits, So, the query works for sure! but, it always returns null as the value of the avgBetweenDates aggregation.

I need a result like this:

"aggregations": {
    "avgBetweenDates": {
        "value": 3.14159 // Π is just an example!
    }
}

Any ideas will help!

Thank you.


Solution

  • Scripted Fields are not stored fields in ES. You can only perform aggregation on the stored fields as scripted fields are created on the fly.

    You can simply move the script logic in the Average Aggregation as shown below. Note that for the sake of understanding, I've created sample mapping, documents, query and its response.

    Mapping:

    PUT my_date_index
    {
      "mappings": {
        "properties": {
          "check_in":{
            "type":"date",
            "format": "date_time"
          },
          "check_out":{
            "type": "date",
            "format": "date_time"
          },
          "created_at":{
            "type": "date",
            "format": "date_time"
          }
        }
      }
    }
    

    Sample Documents:

    POST my_date_index/_doc/1
    {
      "check_in": "2019-01-15T00:00:00.000Z",
      "check_out": "2019-01-20T00:00:00.000Z",
      "created_at": "2019-01-17T00:00:00.000Z"
    }
    
    POST my_date_index/_doc/2
    {
      "check_in": "2019-01-15T00:00:00.000Z",
      "check_out": "2019-01-22T00:00:00.000Z",
      "created_at": "2019-01-20T00:00:00.000Z"
    }
    

    Aggregation Query:

    POST my_date_index/_search
    {
      "size": 0,
      "aggs": {
        "my_dates_diff": {
          "avg": {
            "script": """
              ZonedDateTime d1 = doc['created_at'].value;
              ZonedDateTime d2 = doc['check_in'].value;
              long differenceInMillis = ChronoUnit.MILLIS.between(d1, d2);
              return Math.abs(differenceInMillis/86400000);
            """
          }
        }
      }
    }
    

    Notice, that you wanted difference in number of days. The above logic does that.

    Response:

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "my_dates_diff" : {
          "value" : 3.5              <---- Average in Number of Days
        }
      }
    }
    

    Hope this helps!