Search code examples
elasticsearchnestelasticsearch-dsl

ElasticSearch aggregates on nested fields


I have an index with the following structure.

{
      "title": "Your top FIY tips",
      "content": "Fix It Yourself in April 2012.",
      "tags": [
        {
          "tagName": "Fix it yourself"
        },
        {
          "tagName": "customer tips"
        },
        {
          "tagName": "competition"
        }
      ]  
}

The mapping looks like

{
"articles": {
"mappings": {
  "article": {
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "tags": {
        "type": "nested",
        "properties": {
          "tagName": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}
}

I am using the following DSL query to search on the "content" and "title" fields and narrow the results down by a certain "tagName". Then use aggregates to count the tagNames within that query.

GET /articles/_search
{
  "from": 1,
  "size": 10,
  "aggs": {
    "tags": {
      "nested": {
        "path": "tags"
      },
      "aggs": {
        "tags-tagnames": {
          "terms": {
            "field": "tags.tagName.raw"
          }
        }
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "FIY",
            "fields": [
              "title",
              "content"
            ]
          }
        },
        {
          "nested": {
            "query": {
              "terms": {
                "tags.tagName": [
                  "competition"
                ]
              }
            },
            "path": "tags"
          }
        }
      ]
    }
  }
}

The search query and filter of the "tagNames" works fine. However the aggregates is not quite working. It doesn't seem to include the nested query data within the results. The aggregate results that come back are just based on the multi match search.

How can I include the nested query within the aggregates.

Sample documents at

https://gist.github.com/anonymous/83bc2b1bfa0ac0d295d42297e1d76c00


Solution

  • After discussing, I think I understand your problem better:

    you wish to run the aggregation only on those documents that are included based on the "from" and "size" specified in the query.

    "from" only affects the hits that are returned for the query, aggregations calculate on all documents that will match the query.

    What you want to do is currently not possible due to the way in which Elasticsearch works. There are two phases to a search request in Elasticsearch:

    Query phase

    The query phase is when all shards in the cluster are queried, the document ids for docs that match the query are returned. Aggregations also run in the query phase.

    Fetch phase

    In the fetch phase, the actual documents that match the ids from the query phase are fetched and included in the result. In your scenario, you would need the aggregation to run in the fetch phase, to aggregate only over those docs included from the query phase.

    The only way to affect which documents are taken into account for the aggregation is to include additional queries/filters in the query of the request, but there is no query that says "documents in sort order positions 1 to 10" as far as I am aware.

    You could always aggregate client side for your particular use case here, as you are aggregating effectively on the verbatim value in each tag