Search code examples
elasticsearchopensearch

Is there a query to count only the root documents in a mapping with a nested field?


I have a mapping that contains a nested field, and based on what I read from the documentation I understand that ES expands each of the items in the nested field into its own document + the root document. But due to a business logic, I need to count and query only the root documents. Is there a way to achieve this? I've been reading that a reverse_nested can be the option to achieve this, but in order to use it, I have to query at least one field in the nested field, is that right?

UPDATE 1: The count I need to do is literally an aggregation, to count by a given field but only on root objects. Something like this

PUT test
{
  "mappings": {
    "properties": {
      "flag": "keyword"
      "topics": {
        "type": "nested"
      }
    }
  }
}

And then, we have documents like this POST test/_bulk

{"index":{"_index":"test" }}
{"flag":"A", "topics":[ {"topic":"topic 1"}, {"topic":"topic 2"}], "_id": 1}
{"index":{"_index":"test" }}
{"flag":"A", "topics":[ {"topic":"topic 2"}, {"topic":"topic 3"}], "_id": 2}
{"index":{"_index":"test" }}
{"flag":"B", "topics":[ {"topic":"topic 1"}, {"topic":"topic 3"}], "_id": 2}

What I need is to be 100% that I can get a query that can return a count by flag field so I can get

{
   "key": "A"
   "doc_count": 2
},
{
   "key": "B"
   "doc_count": 1
}

Insead of

{
   "key": "A"
   "doc_count": 6
},
{
   "key": "B"
   "doc_count": 3
}

Does this makes sense?


Solution

  • PUT test_nested_count
    {
      "mappings": {
        "properties": {
          "event": {
            "type": "nested"
          }
        }
      }
    }
    

    POST test_nested_count/_doc/1?refresh
    {
      "event": [
        {
          "original": "nested_value_1",
          "other_field": "first_value"
        },
        {
          "original": "nested_value_2",
          "other_field": "second_value"
        }
      ]
    }
    

    GET _cat/count/test_nested_count?v
    

    GET _cat/indices/test_nested_count?v&h=index,docs.count
    

    You can compare the output of _cat/indices and _count API results. The _count API results will show the number of documents, the _cat/indices API call will show the number of nested documents.

    The output:

    # GET _cat/indices/test_nested_count?v&h=index,docs.count 200 OK
    index             docs.count
    test_nested_count          3
    
    # GET _cat/count/test_nested_count?v 200 OK
    epoch      timestamp count
    1705408466 12:34:26  1
    

    enter image description here

    UPDATE:

    enter image description here