Search code examples
elasticsearchelasticsearch-aggregation

Terms Aggregation based on Distinct Terms per Collection


I have documents which look like this:

{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        }
      ]
    }
  ]
}

I want buckets of the fruit term with counts of how many documents under bars contain bears with each given fruit. For instance, given the following documents:

{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        },
        {
          "fruit": "banana"
        },
        {
          "fruit": "apple"
        }
      ]
    },
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        }
      ]
    }
  ]
}
{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "apple"
        },
        {
          "fruit": "apple"
        },
        {
          "fruit": "orange"
        }
      ]
    }
  ]
}

I would like results that look like this:

"buckets": [
  {
    "key": "banana",
    "doc_count": 2
  },
  {
    "key": "apple",
    "doc_count": 2
  },
  {
    "key": "orange",
    "doc_count": 1
  }
]

That is, banana appears as the descendent of 2 distinct bars, apple appears as the descendent of 2 distinct bars, and orange appears as the descendent of 1 distinct bar.

Right now I have the following aggregation, which counts the total fruits:

{
  "aggs": {
    "global": {
      "global": {},
      "aggs": {
        "bars": {
          "nested": {
            "path": "bars"
          },
          "aggs": {
            "bears": {
              "nested": {
                "path": "bars.bears"
              },
              "aggs": {
                "fruits": {
                  "terms": {
                    "field": "bars.bears.fruit"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

This results in the following:

"buckets": [
  {
    "key": "banana",
    "doc_count": 3
  },
  {
    "key": "apple",
    "doc_count": 3
  },
  {
    "key": "orange",
    "doc_count": 1
  }
]

Which is not what I'm looking for. Is it possible to modify this query to count the distinct bars which contain each fruit?


Solution

  • I actually managed to get the results I was looking, although in a slightly different shape:

    Query

    {
      "aggs": {
        "global": {
          "global": {},
          "aggs": {
            "bars": {
              "nested": {
                "path": "bars"
              },
              "aggs": {
                "bears": {
                  "nested": {
                    "path": "bars.bears"
                  },
                  "aggs": {
                    "fruits": {
                      "terms": {
                        "field": "bars.bears.fruit"
                      },
                      "fruit_to_bears": {
                        "reverse_nested": {}
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    

    Results

    "buckets": [
      {
        "key": "banana",
        "doc_count": 3,
        "fruit_to_bears": {
          "doc_count": 2
        }
      },
      {
        "key": "apple",
        "doc_count": 3,
        "fruit_to_bears": {
          "doc_count": 2
        }
      },
      {
        "key": "orange",
        "doc_count": 1,
        "fruit_to_bears": {
          "doc_count": 1
        }
      }
    ]