Search code examples
elasticsearchelasticsearch-aggregation

Translate MySQL aggregation query to ElasticSearch


I have a comments table that over the past year has grown considerably and I'm moving it to ElasticSearch.

The problem is that I need to adapt a query that I currently have in MySQL which returns the total number of comments for each day in the last 7 days for a given post.

Here's the MySQL query that I have now:

SELECT count(*) AS number, DATE(created_at) AS date 
FROM `comments` 
WHERE `post_id` = ? 
GROUP BY `date` 
ORDER BY `date` DESC 
LIMIT 7

My index looks like this:

{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "post_id": {
          "type": "integer"
        },
        "subject": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "body": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "created_at": {
          "format": "yyyy-MM-dd HH:mm:ss",
          "type": "date"
        }
      }
    }
  }
}

Is it possible to reproduce that query for ElasticSearch? If so, how would that look like?

My ElasticSearch knowledge is kinda limited, I know that it offers aggregation, but I don't really know how to put it all together.


Solution

  • Use the following query to get all the comments on a given "post_id" for the last 7 days.

    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "id": {
                  "value": "the_post_id"
                }
              }
            },
            /*** only include this clause if you want the recent most 7 days ***/
            {
              "range": {
                "created_at": {
                  "gte": "now-7d/d",
                  "lt": "now/d"
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "posts_per_day": {
          "date_histogram": {
            "field": "created_at",
            "calendar_interval": "day",
            "order" : {"_key" : "desc"}
          }
        }
      }
    }
    

    From the aggregation, pick up the first 7 buckets in your client application.