Search code examples
elasticsearchelasticsearch-dsl

List values of fields that appear in multiple documents ElasticSearch


I have an ElasticSearch database with documents that contain serviceName and documentId fields, like this:

Document 1:

{
"serviceName": 2020,
"documentId": 1
}

Document 2:

{
"serviceName": 2020,
"documentId": 2
}

Document 3:

{
"serviceName": 2019,
"documentId": 1
}

I need to list all serviceName values for each documentId value. So, here the answer would be:

1: 2020, 2019

2: 2020

Can this be done in ElasticSearch?


Solution

  • You can use terms aggregation from Elasticsearch to get your expected result.

    Below is sample query:

    POST 74854188/_search
    {
      "size": 0,
      "aggs": {
        "ID": {
          "terms": {
            "field": "documentId",
            "size": 10
          },
          "aggs": {
            "ServiceName": {
              "terms": {
                "field": "serviceName",
                "size": 10
              }
            }
          }
        }
      }
    }
    

    Output for your sample data given in question:

    "aggregations": {
        "ID": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": 1,
              "doc_count": 2,
              "ServiceName": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": 2019,
                    "doc_count": 1
                  },
                  {
                    "key": 2020,
                    "doc_count": 1
                  }
                ]
              }
            },
            {
              "key": 2,
              "doc_count": 1,
              "ServiceName": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": 2020,
                    "doc_count": 1
                  }
                ]
              }
            }
          ]
        }
      }
    

    If you have defined field with text and keyword both th type then you need to use documentId.keyword and serviceName.keyword, If field is defined as long then above query work as it is.