Search code examples
elasticsearchsynonym

Is it possible to use elasticsearch synonyms dynamically?


I'm struggling trying to implement synonyms in elasticsearch queries. As i've seen in Synonyms, i could create a filter when creating the index with the following structure:

"sinonimos": {
  "type": "synonym",
  "explicit": false,
  "synonyms_path": "sinonimos.txt"
}

This filter is being used in the analyzer, and if i understood it right, any of the words in the "sinonimos.txt" would be considered a match if my query was "celular", am i wrong?

The file "sinonimos.txt" is created in the elasticsearch/config folder, and has the following content:

smartphone, telefone => celular

Knowing that i have 4 different values in my index field called "descricao":

  • celular
  • smartphone
  • telefone
  • cellphone

When i modify the "sinonimos.txt" file with a new word "cellphone", then using the _reload_search_analyzers API in my index, shouldn't that word be queried too?

The main question here is, how can i implement this synonyms in a dinamic way, so when i add a new synonym to a word, or a new list of synonyms, i don't have to reindex the current index?

In this Elastic Blog, they say

Synonyms are used in analyzers that can be used at index time or at search time.

The difference between them is clear and also said in Elastic Blog:

Index-time synonyms have several disadvantages:

  • The index might get bigger, because all synonyms must be indexed.
  • Search scoring, which relies on term statistics, might suffer because synonyms are also counted, > and the statistics for less common words become skewed.
  • Synonym rules can’t be changed for existing documents without reindexing.

Using synonyms in search-time analyzers on the other hand doesn’t have many of the above mentioned problems:

  • The index size is unaffected.
  • The term statistics in the corpus stay the same.
  • Changes in the synonym rules don’t require reindexing of documents.

The idea of the dynamicity is that i will add/remove new words / lists of words all the time, and would like to do that in a dynamic way so i dont have to reindex or _close > update > _open the index. Is that possible?

Edit: The problem was that i was trying to use the synonym analyzer as the index analyzer, instead of using it as the search_analyzer.


Solution

  • Some useful information about Elasticsearch synonyms token filter

    "filter": {
      "synonym": {
        "type": "synonym_graph",
        "synonyms_path": "analysis/synonym.txt",  
        "updateable": true                        
      }
    }
    

    To update the synonyms list update the synonym.txt file and call POST /index_name/_reload_search_analyzers and clear the request cache POST /index_name/_cache/clear?request=true Because reloading affects every node with an index shard, it's important to update the synonym file on every data node in the cluster, including nodes that don’t contain a shard replica, before using this API. This ensures the synonym file is updated everywhere in the cluster in case shards are relocated in the future. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-reload-analyzers.html#indices-reload-analyzers-api-desc

    curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
    {
      "settings": {
        "index": {
          "analysis": {
            "analyzer": {
              "my_synonyms": {
                "tokenizer": "whitespace",
                "filter": [ "synonym" ]
              }
            },
            "filter": {
              "synonym": {
                "type": "synonym_graph",
                "synonyms_path": "analysis/synonym.txt",  
                "updateable": true                        
              }
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "text": {
            "type": "text",
            "analyzer": "standard",
            "search_analyzer": "my_synonyms"              
          }
        }
      }
    }
    '
    

    Recommendation: Set the profile:true and check the results. You will see the matched fields for your search_analyzer.

    GET test_synonym/_search
    {
      "profile": true,
      "query": {
        "match": {
          "text": {
            "query": "test search",
            "analyzer": "my_synonyms"
          }
        }
      }
    }