Search code examples
elasticsearchelasticsearch-aggregation

ElasticSearch: aggregation filtering


For simplicity suppose I have index from 3 rows in elastic:

{"id": 1, "tags": ["t1", "t2", "t3"]}, 
{"id": 2, "tags": ["t1", "t4", "t5"]}

I need to aggregate by some tags without returning result of other tags in matching documents:

{
  "aggs": {
    "tags": {
      "terms": {"field": "tags"}
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {"tags": ["t1", "t2"]}
        }
      ]
    }
  }
}

# RESULT
{
    "aggregations": {
        "tags": {
            "buckets": [
                {"doc_count": 2, "key": "t1"},
                {"doc_count": 1, "key": "t2"},
                {"doc_count": 1, "key": "t3"},  # should be removed by filter
                {"doc_count": 1, "key": "t4"},  # should be removed by filter
                {"doc_count": 1, "key": "t5"},  # should be removed by filter
            ],
        }
    },
    "hits": {
        "hits": [],
        "max_score": 0.0,
        "total": 2
    },
}

How to (maybe) postfilter this result?

Because in case of 3 rows in index this only 3 extra items (t3, t4, t5). But in real situation I have more than 200K rows in index and it's horrible! I need aggregate by 50 tags, but I get result with more than 1K tags.


Solution

  • Assuming that your version of Elasticsearch supports it, I should use the "include" attribute to the term aggregation. Your query should be as above:

    POST /test/_search
    {
      "aggs": {
        "tags": {
          "terms": {"field": "tags",  "include": ["t1", "t2"]}
        }
      },
      "query": {
        "bool": {
          "filter": [
            {
              "terms": {"tags": ["t1", "t2"]}
            }
          ]
        }
      }
    }
    

    ```