Search code examples
elasticsearchelasticsearch-aggregation

how to bucket empty and non empty fields in nested aggregation in elasticsearch?


I have the following set of nested subaggregations in elasticsearch (field2 is a subaggregation of field1 and field3 is a subaggregation of field2). It turns out however that the terms aggregation for field3 will not bucket documents that dont have field3.

My understanding is that I have to use a Missing subaggregation query to bucket those in addition to the term query for field3.

But I am not sure how can I add it to the query below to bucket both.

{
  "size": 0,
  "aggregations": {
    "f1": {
      "terms": {
        "field": "field1",
        "size": 0,
        "order": {
          "_count": "asc"
        },
        "include": [
          "123"
        ]
      },
      "aggregations": {
        "field2": {
          "terms": {
            "field": "f2",
            "size": 0,
            "order": {
              "_count": "asc"
            },
            "include": [
              "tr"
            ]
          },
          "aggregations": {
            "field3": {
              "terms": {
                "field": "f3",
                "order": {
                  "_count": "asc"
                },
                "size": 0
              },
              "aggregations": {
                "aggTopHits": {
                  "top_hits": {
                    "size": 1
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Solution

  • In version 2.1.2 and later, you can use the missing parameter of the terms aggregation, which allows you to specify a default value for documents that are missing that field. (FYI, the missing parameter was available starting 2.0, but there was a bug which prevented it from working on sub-aggregations, which is how you would use it here.)

         ...
         "aggregations": {
            "field3": {
              "terms": {
                "field": "f3",
                "order": {
                  "_count": "asc"
                },
                "size": 0,
                "missing": "n/a"     <----- provide a default here
              },
              "aggregations": {
                "aggTopHits": {
                  "top_hits": {
                    "size": 1
                  }
                }
              }
            }
          }
    

    However, if you are working with a pre-2.x ES cluster, you can use the missing aggregation at the same depth as your field3 aggregation to bucket the documents that are missing "f3" like this:

         ...
         "aggregations": {
            "field3": {
              "terms": {
                "field": "f3",
                "order": {
                  "_count": "asc"
                },
                "size": 0
              },
              "aggregations": {
                "aggTopHits": {
                  "top_hits": {
                    "size": 1
                  }
                }
              }
            },
            "missing_field3": {
              "missing" : {
                "field": "f3"
              },
              "aggregations": {
                "aggTopMissingHit": {
                  "top_hits": {
                    "size": 1
                  }
                }
              }
            }
          }