I'm struggling trying to implement synonyms in elasticsearch queries. As i've seen in Synonyms, i could create a filter when creating the index with the following structure:
"sinonimos": {
"type": "synonym",
"explicit": false,
"synonyms_path": "sinonimos.txt"
}
This filter is being used in the analyzer, and if i understood it right, any of the words in the "sinonimos.txt" would be considered a match if my query was "celular", am i wrong?
The file "sinonimos.txt" is created in the elasticsearch/config folder, and has the following content:
smartphone, telefone => celular
Knowing that i have 4 different values in my index field called "descricao":
When i modify the "sinonimos.txt" file with a new word "cellphone", then using the _reload_search_analyzers
API in my index, shouldn't that word be queried too?
The main question here is, how can i implement this synonyms in a dinamic way, so when i add a new synonym to a word, or a new list of synonyms, i don't have to reindex the current index?
In this Elastic Blog, they say
Synonyms are used in analyzers that can be used at index time or at search time.
The difference between them is clear and also said in Elastic Blog:
Index-time synonyms have several disadvantages:
Using synonyms in search-time analyzers on the other hand doesn’t have many of the above mentioned problems:
The idea of the dynamicity is that i will add/remove new words / lists of words all the time, and would like to do that in a dynamic way so i dont have to reindex or _close > update > _open the index. Is that possible?
Edit: The problem was that i was trying to use the synonym analyzer as the index analyzer, instead of using it as the search_analyzer.
"filter": {
"synonym": {
"type": "synonym_graph",
"synonyms_path": "analysis/synonym.txt",
"updateable": true
}
}
To update the synonyms list update the synonym.txt file and call POST /index_name/_reload_search_analyzers
and clear the request cache POST /index_name/_cache/clear?request=true
Because reloading affects every node with an index shard, it's important to update the synonym file on every data node in the cluster, including nodes that don’t contain a shard replica, before using this API. This ensures the synonym file is updated everywhere in the cluster in case shards are relocated in the future.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-reload-analyzers.html#indices-reload-analyzers-api-desc
curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_synonyms": {
"tokenizer": "whitespace",
"filter": [ "synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym_graph",
"synonyms_path": "analysis/synonym.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_synonyms"
}
}
}
}
'
Recommendation: Set the profile:true
and check the results. You will see the matched fields for your search_analyzer.
GET test_synonym/_search
{
"profile": true,
"query": {
"match": {
"text": {
"query": "test search",
"analyzer": "my_synonyms"
}
}
}
}