Search code examples
elasticsearchelastic-stackelasticsearch-5

Want to get distinct records in hits section from elasticsearch


I want to get all the distinct records as per "departmentNo" . Please check the below Index Data : (it is dummy data.)

  {'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
    {'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 2, "employeeName": "rathod", ...}
    {'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
    {'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 4, "employeeName": "kamal", ...}
    {'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 5, "employeeName": "rahul", ...}

I want the below output.

{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}

I was trying to get data in hits section. But didn't found the answer. So I tried with aggeration. Used below query

{
  "size": 0,
  "aggs": {
    "Group_By_Dept": {
      "terms": {
        "field": "departmentNo"
      },
      "aggs": {
        "group_docs": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

I got the data by the above query. But I want all the distinct data and they should support pagination + sorting. In elastic 6.0 we could use bucket_sort , but I am using 5.6.7.So I can't use bucket_sort.

So Can I do it in any other way.? If I could get data in hits's section then it will be good.

(I don't want to change my index mapping. Actually here i have added dummy mapping. but usecase is same.)


Solution

  • You can do that by using field collapsing:

    {
      "query": { ... },
      "from": 153,
      "size": 27,
      "collapse": {
        "field": "departmentNo"
      }
    }
    

    This will leave only one document for each repeating value in such field. You can control which document it would be using standard sort (i.e. document with highest sort value among collapsed would be returned).

    Please note that there is additional functionality called inner hits, which you may want to use in the future - be aware that it multiplies document fetches and negatively affects performance.