Search code examples
elasticsearchelasticsearch-dslelasticsearch-queryelasticsearch-mapping

How can I get auto-suggestions for synonyms match in elasticsearch


I'm using the code below and it does not give auto-suggestion as curd when i type "cu"

But it does match the document with yogurt which is correct. How can I get both auto-complete for synonym words and document match for the same?

PUT products
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
            "lowercase",
              "synonym_graph"
            ]
          }
        },
        "filter": {
          "synonym_graph": {
            "type": "synonym_graph",
            "synonyms": [
               "yogurt, curd, dahi"
            ]
          }
        }
      }
    }
  }
}
PUT products/_mapping
{
  "properties": {
    "description": {
      "type": "text",
      "analyzer": "synonym_analyzer"
    }
  }
}
POST products/_doc
{
  "description": "yogurt"
}
GET products/_search
{
  "query": {
    "match": {
      "description": "cu"
    }
  }
}

Solution

  • When you provide a list of synonyms in a synonym_graph filter it simply means that ES will treat any of the synonyms interchangeably. But when they're analyzed via the standard analyzer, only full-word tokens will be produced:

    POST products/_analyze?filter_path=tokens.token
    {
      "text": "yogurt",
      "field": "description"
    }
    

    yielding:

    {
      "tokens" : [
        {
          "token" : "curd"
        },
        {
          "token" : "dahi"
        },
        {
          "token" : "yogurt"
        }
      ]
    }
    

    As such, a regular match_query won't cut it here because the standard analyzer hasn't provided it with enough context in terms of matchable substrings (n-grams).

    In the meantime you can replace match with match_phrase_prefix which does exactly what you're after -- match an ordered sequence of characters while taking into account the synonyms:

    GET products/_search
    {
      "query": {
        "match_phrase_prefix": {
          "description": "cu"
        }
      }
    }
    

    But that, as the query name suggests, is only going to work for prefixes. If you fancy an autocomplete that suggests terms regardless of where the substring matches occur, have a look at my other answer where I talk about leveraging n-grams.