Search code examples
elasticsearchaggregationelasticsearch-aggregationelasticsearch-dsl

Count the number of elements in a list field in Elastic Search


I'm still learning to use DSL queries in ElasticSearch. I have documents where one field is a list. I need to count the number of documents that have one element in this field, two elements in this field, etc. For example, here is a document structure:

Document1:

"Volume": [
{
"partition": "s1",
"fieldtype": ["A","B"]
}
]

Document 2:

"Volume": [
{
"partition": "s1",
"fieldtype": ["A"]
}
]

Document 3:

"Volume": [
{
"partition": "s1",
"fieldtype": ["B"]
}
]

I need a way to calculate that there is one document with 2 elements in fieldtype field and 2 documents with one element in fieldtype. If I try to aggregate them like this:

"size":0,
"aggs": {
"name": {
"terms": {
"field": "fieldtype.keyword"
}
}
}

I get counts of elements (number of As and Bs). Without using keyword, I get an error.


Solution

  • @rabbitbr provided a good answer, but I could not understand why we tried to use a nested field. And, I think we need to use terms aggregation instead of sum here. Anyhow, here is a solution without nested :

    PUT idx_test
    
    POST idx_test/_bulk
    {"index":{ "_id": 1}}
    {"Volume":[{"partition": "s1","fieldtype": ["A","B"]}]}
    {"index":{ "_id": 2}}
    {"Volume":[{"partition": "s1","fieldtype": ["A"]}]}
    {"index":{ "_id": 3}}
    {"Volume":[{"partition": "s1","fieldtype": ["B"]}]}
    
    GET idx_test/_mapping
    
    GET idx_test/_search
    {
      "size": 0,
      "aggs": {
        "size": {
          "terms": {
            "script": {
              "lang": "painless",
              "source": "doc['Volume.fieldtype.keyword'].size()"
            }
          }
        }
      }
    }
    

    Without using keyword, I get an error.

    This is normal because without keyword you are trying to build an aggregation on a field whose type is text.

    Here the response for the query above which is pretty basic query :

    {
      ....
      "aggregations": {
        "size": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "1",
              "doc_count": 2
            },
            {
              "key": "2",
              "doc_count": 1
            }
          ]
        }
      }
    }
    

    As you can see, we have 2 documents with 1 sized array and we have 1 document with 2 sized array.