Search code examples
elasticsearchcassandraelassandra

Aggregation, Date range query in Elassandra/Elastic Search


Getting different results while searching on the date range aggregation indexing.

Created the index like below.

curl -XPUT -H 'Content-Type: application/json' 'http://x.x.x.x:9200/date_index' -d '{
  "settings" : { "keyspace" : "keyspace1"},
  "mappings" : {
    "table1" : {
      "discover":"sent_date",
      "properties" : {
        "sent_date" : { "type": "date", "format": "yyyy-MM-dd HH:mm:ssZZ" }
        }
    }
  }
}'

When trying searching with below code, i am getting different date range results.

    curl -XGET -H 'Content-Type: application/json' 'http://x.x.x.x:9200/date_index/_search?pretty=true' -d '
    {
      "aggs" : {
        "sentdate_range_search" : {
          "date_range" : {
            "field" : "sent_date",
            "time_zone": "UTC",
            "format" : "yyyy-MM-dd HH:mm:ssZZ",
            "ranges" : [
              { "from" : "2010-05-07 11:22:34+0000", "to" : "2011-05-07 11:22:34+0000"}
            ]
      }
    }
  }
}'

Sample output, showing different results like 2039, 2024 etc.

{
  "took" : 26,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 417427,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "date_index",
        "_type" : "table1",
        "_id" : "P89200822_4210021505784",
        "_score" : 1.0,
        "_source" : {
          "sent_date" : "2039-05-22T14:45:39.000Z"
        }
      },
      {
        "_index" : "date_index",
        "_type" : "table1",
        "_id" : "P89200605_4210020537428",
        "_score" : 1.0,
        "_source" : {
           "sent_date" : "2024-06-05T07:20:57.000Z"
        }
      },
      .........
    "aggregations" : {
    "sentdate_range_search" : {
      "buckets" : [
        {
          "key" : "2010-05-07 11:22:34+00:00-2011-05-07 11:22:34+00:00",
          "from" : 1.273231354E12,
          "from_as_string" : "2010-05-07 11:22:34+00:00",
          "to" : 1.304767354E12,
          "to_as_string" : "2011-05-07 11:22:34+00:00",
          "doc_count" : 0
         }
      ]
    }
  }

FYI: I am using the data that was resided in Cassandra Database where the field "sent_date" is stored with UTC timezone.

Please advise, thanks


Solution

  • == Reworked answer based on conversation in the comments ==

    Aggregations are different than search queries. Aggregations combine records (i.e. aggregate!) along specified dimensions. The query in the question aggregates records that fall between the two specified dates into a single bucket. More info on aggregations can be found in the Elasticsearch documentation

    Since the requirement is to filter records that fall between two dates, a date range filter is the appropriate approach:

    GET date_index/_search
    {
       "query": {
           "bool": {
               "filter": {
                   "range": {
                       "sent_date": {
                           "gte": "2010-05-07 11:22:34+0000",
                           "lte": "2011-05-07 11:22:34+0000"
                       }
                   }
                }
            }
        }
    }
    

    Why filter instead of regular query? Filters are faster than searches because they don't contribute to document scoring and they're cacheable. You can combine filters and searches to, for example, get all records within the given time range that match the phrase "all work and no play makes jack a dull boy."