Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dsl

How to group distinct records in Elasticsearch


I have the following data in my Elasticsearch index:

{
  "title": "Hello from elastic",
  "name": "ABC",
  "j_id": "1",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "PQR",
  "j_id": "1",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "XYZ",
  "j_id": "2",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "MNO",
  "j_id": "3",
  "date": '2021-03-02T12:29:31.356514'
}

Now I want to get unique records on the basis of the id.

The expected output is:

{
    "1": [{
      "title": "Hello from elastic",
      "name": "ABC",
      "j_id": "1",
      "date": '2021-03-02T12:29:31.356514'
    },
    {
      "title": "Hello from elastic",
      "name": "PQR",
      "j_id": "1",
      "date": '2021-03-02T12:29:31.356514'
    }],
    "2": [{
      "title": "Hello from elastic",
      "name": "XYZ",
      "j_id": "2",
      "date": '2021-03-02T12:29:31.356514'
    }],
    "3": [{
      "title": "Hello from elastic",
      "name": "MNO",
      "j_id": "3",
      "date": '2021-03-02T12:29:31.356514'
    }]
  }

I tried an aggregate query but it's giving me only the counts. Also, I want to include latest record in my response.

  1. How can I get sorted, unique records from Elasticsearch grouped by the id?
  2. I want latest inserted data first

Solution

  • Assuming a minimal mapping covering the date and j_id fields:

    PUT myindex
    {
      "mappings": {
        "properties": {
          "j_id": {
            "type": "keyword"
          },
          "date": {
            "type": "date"
          }
        }
      }
    }
    

    you can leverage a terms aggregation whose sub-aggregation is an ordered top_hits aggregation:

    POST myindex/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.sorted_hits.hits.hits._source
    {
      "size": 0,
      "aggs": {
        "by_j_id": {
          "terms": {
            "field": "j_id",
            "size": 10,
            "order": {
              "max_date": "desc"
            }
          },
          "aggs": {
            "max_date": {
              "max": {
                "field": "date"
              }
            },
            "sorted_hits": {
              "top_hits": {
                "size": 10,
                "sort": [
                  {
                    "date": {
                      "order": "desc"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
    

    The URL parameter filter_path reduces the response body to closely mimic your required format:

    {
      "aggregations" : {
        "by_j_id" : {
          "buckets" : [
            {
              "key" : "1",        
              "sorted_hits" : {
                "hits" : {
                  "hits" : [
                    {
                      "_source" : {
                        "title" : "Hello from elastic",
                        "name" : "ABC",
                        "j_id" : "1",
                        "date" : "2021-03-02T12:29:31.356514"
                      }
                    },
                    {
                      "_source" : {
                        "title" : "Hello from elastic",
                        "name" : "PQR",
                        "j_id" : "1",
                        "date" : "2021-03-02T12:29:31.356514"
                      }
                    }
                  ]
                }
              }
            },
            {
              "key" : "2",
              "sorted_hits" : {
                "hits" : {
                  "hits" : [
                    {
                      "_source" : {
                        "title" : "Hello from elastic",
                        "name" : "XYZ",
                        "j_id" : "2",
                        "date" : "2021-03-02T12:29:31.356514"
                      }
                    }
                  ]
                }
              }
            },
            {
              "key" : "3",
              "sorted_hits" : {
                "hits" : {
                  "hits" : [
                    {
                      "_source" : {
                        "title" : "Hello from elastic",
                        "name" : "MNO",
                        "j_id" : "3",
                        "date" : "2021-03-02T12:29:31.356514"
                      }
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }