Search code examples
elasticsearchaggregate

Elasticsearch Aggregation most common list of integers


I am looking for elastic search aggregation + mapping that will return the most common list for a certain field. For example for docs: {"ToneCurvePV2012": [1,2,3]} {"ToneCurvePV2012": [1,5,6]} {"ToneCurvePV2012": [1,7,8]} {"ToneCurvePV2012": [1,2,3]}

I wish for the aggregation result: [1,2,3] (since it appears twice).

so far any aggregation that i made would return: 1


Solution

  • This is not possible with default terms aggregation. You need to use terms aggregation with script. Please note that this might impact your cluster performance.

    Here, i have used script which will create string from array and used it for aggregation. so if you have array value like [1,2,3] then it will create string representation of it like '[1,2,3]' and that key will be used for aggregation.

    Below is sample query you can use to generate aggregation as you expected:

    POST index1/_search
    {
      "size": 0,
      "aggs": {
        "tone_s": {
          "terms": {
          "script": {
            "source": "def value='['; for(int i=0;i<doc['ToneCurvePV2012'].length;i++){value= value + doc['ToneCurvePV2012'][i] + ',';} value+= ']'; value = value.replace(',]', ']'); return value;"
          }
          }
        }
      }
    }
    

    Output:

    {
     "hits" : {
        "total" : {
          "value" : 4,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "tone_s" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "[1,2,3]",
              "doc_count" : 2
            },
            {
              "key" : "[1,5,6]",
              "doc_count" : 1
            },
            {
              "key" : "[1,7,8]",
              "doc_count" : 1
            }
          ]
        }
      }
    }
    

    PS: key will be come as string and not as array in aggregation response.