Search code examples
elasticsearchnestelasticsearch-aggregation

Elasticsearch Terms aggregation with unknown datatype


I'm indexing data of unknown schema in Elasticsearch using dynamic mapping, i.e. we don't know the shape, datatypes, etc. of much of the data ahead of time. In queries, I want to be able to aggregate on any field. Strings are (by default) mapped as both text and keyword types, and only the latter can be aggregated on. So for strings my terms aggregations must look like this:

"aggs": {
    "something": {
        "terms": {
            "field": "something.keyword"
        }
    }
}

But other types like numbers and bools do not have this .keyword sub-field, so aggregations for those must look like this (which would fail for text fields):

"aggs": {
    "something": {
        "terms": {
            "field": "something"
        }
    }
}

Is there any way to specify a terms aggregation that basically says "if something.keyword exists, use that, otherwise just use something", and without taking a significant performance hit?

Requiring datatype information to be provided at query time might be an option for me, but ideally I want to avoid it if possible.


Solution

  • If the primary use case is aggregations, it may be worth changing the dynamic mapping for string properties to index as a keyword datatype, with a multi-field sub-field indexed as a text datatype i.e. in dynamic_templates

    {
      "strings": {
        "match_mapping_type": "string",
        "mapping": {
          "type": "keyword",
          "ignore_above": 256,
          "fields": {
            "text": {
              "type": "text"
            }
          }
        }
      }
    },