Search code examples
elasticsearchelasticsearch-7

ElasticSearch performance considerations while mapping string fields as both text and keyword?


I have a question regarding the tradeoffs/performance considerations to keep in mind while mapping string fields as both text and keyword vs just one of those.

I have a use-case where mapping around 25-30 string fields as both text and keyword would be a nice to have but if there were some serious performance considerations, then I would drill down and map each of them only to the type they will be searched most as.

I have not been able to find much information online about this. Hence asking here.

ElasticSearch Version 7.10 Thanks!


Solution

  • The default mappings provided by ES which map a field as both text and keyword usually do that because it's convenient and that will allow the field to be used in different contexts without having to think too hard about it. It's also a good way of bootstrapping new projects and not worry too much about that aspect until later in the project.

    However, if you're truly serious about your mappings and the performance of your cluster, you should always give as much thought as possible as to why you map a field in certain way.

    There are a few basic rules (but your mileage may always vary) in the following (non-exhaustive) list:

    • IDs, codes, keys, etc, that you usually use in exact searches can be mapped as keyword only (and/or wildcard depending on your search use cases).
    • If you have longer pieces of text closer to natural language that you might want to run full-text searches on, it's usually a good idea to map them as text.
    • The corollary to the previous rule is that if you know that you'll never want to run full text searches on some field, don't map it as text as there is a non-negligible overhead related to indexing text fields during the analysis process.
    • ...

    As said, obviously the above list is non-exhaustive, but it gives you some pointers. The bottom line is that you need to think hard about your data and what you want to do with it. Once you know the use cases you need to support, you'll know how to map your fields. I would never accept to let a default text/keyword mapping if there's no reason to do it.