elasticsearch elasticsearch-aggregation elasticsearch-dsl

Number of Documents Per Day bucket and applied some filters

I have documents in elasticsearch in which each document looks something like as follows:

{
  "id": "T12890ADSA12",
  "status": “CREATED”,
  "type": “ABC”,
  "updatedAt": "2020-05-29T18:18:08.483Z",
   "createdAt": "2020-04-30T13:41:25.862Z"
}

For this document structure, I wanted to get all the documents which are having status as CREATED or SCHEDULED and TYPE is ABC. And in these filtered documents, I want to aggregate the number of documents based on currentDate - createdAt in days bucket. Eg.

Created Date as today’s date -> Count of documents created today
Created Date as yesterday’s date -> Count of documents created yesterday

And similarly for last 7 days.

Is there an easy way to do this in a single query?

Solution

Please find the below mapping, sample docs, aggregation query and response:

Mapping:

PUT my_date_index
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "status": {
        "type": "keyword"
      },
      "type": {
        "type": "keyword"
      },
      "updatedAt": {
        "type": "date"
      },
      "createdAt": {
        "type": "date"
      }
    }
  }
}

Sample Documents:

POST my_date_index/_doc/1
{
  "id": "T12890ADSA12",
  "status": "CREATED",
  "type": "ABC",
  "updatedAt": "2020-05-29T18:18:08.483Z",
  "createdAt": "2020-07-06T05:00:00.000Z"
}

POST my_date_index/_doc/2
{
  "id": "T12890ADSA13",
  "status": "SCHEDULED",
  "type": "ABC",
  "updatedAt": "2020-05-29T18:18:08.483Z",
  "createdAt": "2020-07-05T13:41:25.862Z"
}

POST my_date_index/_doc/3
{
  "id": "T12890ADSA14",
  "status": "SCHEDULED",
  "type": "ABC",
  "updatedAt": "2020-05-29T18:18:08.483Z",
  "createdAt": "2020-07-04T06:00:00.000Z"
}

POST my_date_index/_doc/4
{
  "id": "T12890ADSA15",
  "status": "SCHEDULED",
  "type": "ABC",
  "updatedAt": "2020-05-29T18:18:08.483Z",
  "createdAt": "2020-07-03T07:00:00.000Z"
}

Query Request:

POST my_date_index/_search
{
  "size": 0,               <----- Remove this to return documents too
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "type": "ABC"
          }
        },
        {
          "range": {
            "createdAt": {
              "gte": "now-7d",
              "lte": "now"
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "status": "SCHEDULED"
          }
        },
        {
          "term": {
            "status": "CREATED"
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "aggs": {
    "my_date": {
      "date_histogram": {
        "field": "createdAt",
        "calendar_interval": "day",
        "order": {
          "_key": "desc"
        }
      }
    }
  }
}

Note that I've first filtered the documents based on date and the conditions which you've provided.

This would return all the documents. Post which I've applied date histogram query to get the documents for each and every day on that date range.

Response:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_date" : {
      "buckets" : [
        {
          "key_as_string" : "2020-07-06T00:00:00.000Z",
          "key" : 1593993600000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-07-05T00:00:00.000Z",
          "key" : 1593907200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-07-04T00:00:00.000Z",
          "key" : 1593820800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-07-03T00:00:00.000Z",
          "key" : 1593734400000,
          "doc_count" : 1
        }
      ]
    }
  }
}

Hope this helps!