I want to index pages in multiple languages into a single index. But for each language I need to define custom language analyzer. So for english page it would use english analyzer, for czech page it would use czech analyzer.
At search time I would set the correct analyzer based on current locale as I do not need to search across languages.
It appears that it was possible in the early versions of Elasticsearch, but I cannot find a way to do it in 7.6
Is there a way to achieve this or do I really need to create an index for each type in each language? That would lead to many indices with only small number of indexed documents.
Or is there a better way to handle this scenario? We are considering about 20 languages and several document types (as far as I understand, types are now deprecated so each needs its own index).
You can use the fields feature which is available in Elastic 7.6, which allows you to store the different languages in a single index, also query time it would be possible to just use the subfield of language which you want to query.
In fact, there is a nice official blog from elastic talking about different approaches to have multi-lingual search and approach given by me is inspired by that which is called per-field based language search.
Example
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english",
"fields": {
"fr": {
"type": "text",
"analyzer": "french"
},
"es": {
"type": "text",
"analyzer": "spanish"
},
"estonian": {
"type": "text",
"analyzer": "estonian"
}
}
}
}
}
}