Search code examples
elasticsearchelasticsearch-analyzers

Elasticsearch - can I define index time analyzer on document level?


I want to index pages in multiple languages into a single index. But for each language I need to define custom language analyzer. So for english page it would use english analyzer, for czech page it would use czech analyzer.

At search time I would set the correct analyzer based on current locale as I do not need to search across languages.

It appears that it was possible in the early versions of Elasticsearch, but I cannot find a way to do it in 7.6

Is there a way to achieve this or do I really need to create an index for each type in each language? That would lead to many indices with only small number of indexed documents.

Or is there a better way to handle this scenario? We are considering about 20 languages and several document types (as far as I understand, types are now deprecated so each needs its own index).


Solution

  • You can use the fields feature which is available in Elastic 7.6, which allows you to store the different languages in a single index, also query time it would be possible to just use the subfield of language which you want to query.

    In fact, there is a nice official blog from elastic talking about different approaches to have multi-lingual search and approach given by me is inspired by that which is called per-field based language search.

    Example

    Sample Index mapping would look like below

    {
        "mappings": {
            "properties": {
                "title": {
                    "type": "text",
                    "analyzer": "english",
                    "fields": {
                        "fr": {
                            "type": "text",
                            "analyzer": "french"
                        },
                        "es": {
                            "type": "text",
                            "analyzer": "spanish"
                        },
                        "estonian": {
                            "type": "text",
                            "analyzer": "estonian"
                        }
                    }
                }
            }
        }
    }