Search code examples
elasticsearchdjango-haystackspelling

In Elasticsearch spelling suggestions are coming back as stems


I'm pretty sure this has to do with stemming, and I'm not sure what I need to change to get spelling suggestions to return whole words.

Settings are:

ELASTICSEARCH_INDEX_SETTINGS = {
  'settings': {
    "analysis": {
        "analyzer": {
            "default": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": ["standard", "lowercase", "stop_words", "cm_snow"]
            },
            "ngram_analyzer": {
                "type": "custom",
                "tokenizer": "lowercase",
                "filter": ["haystack_ngram"]
            },
            "edgengram_analyzer": {
                "type": "custom",
                "tokenizer": "lowercase",
                "filter": ["haystack_edgengram"]
            }
        },
        "tokenizer": {
            "haystack_ngram_tokenizer": {
                "type": "nGram",
                "min_gram": 3,
                "max_gram": 15,
            },
            "haystack_edgengram_tokenizer": {
                "type": "edgeNGram",
                "min_gram": 2,
                "max_gram": 15,
                "side": "front"
            }
        },
        "filter": {
            "haystack_ngram": {
                "type": "nGram",
                "min_gram": 3,
                "max_gram": 15
            },
            "haystack_edgengram": {
                "type": "edgeNGram",
                "min_gram": 2,
                "max_gram": 15
            },
            "cm_snow": {
                "type": "snowball",
                "language": "English"
            },
            "stop_words": {
                "type": "stop",
                "ignore_case": True,
                "stopwords": STOP_WORDS
            }
        }
    }
  }
}

If I do the following query to Elasticsearch:

curl -XPOST 'localhost:9200/listing/_suggest' -d '{
  "my-suggestion" : {
    "text" : "table",
    "term" : {
      "field" : "text"
    }
  }
}'

I get back:

{"text":"tabl","offset":0,"length":5,"options":[]}

Why is the result "tabl", even for a correctly-spelled word?


Solution

  • The problem is that I was using the default analyzer, and the default analyzer was using snowball, which was using the snowball index_analyzer, so the words were getting indexes as their stems.

    Because we still want to search on stemmed words, I added an extra field to my document call suggest that uses the standard analyzer. Into that, I put a text blob of a bunch of the words of that document (title, description, tags) and mark is as include_in_all=false Here's its mapping:

    "suggest": {
        "type": "string",
        "analyzer": "standard"
    },
    

    and then in my query, I query against _all for the actual search results, but use suggest for the suggestions.

    {
      "query": {
         "match": {
             "_all": "tabel"
         }
      },
      "suggest": {
        "suggest-0": {
          "term": {
            "field": "suggest",
            "size": 5
          },
          "text": "tabls"
        }
      }
    }
    

    Which gives:

    {
        "took": 7,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 0,
            "max_score": null,
            "hits": []
        },
        "suggest": {
            "suggest-0": [
                {
                    "text": "tabls",
                    "offset": 0,
                    "length": 5,
                    "options": [
                        {
                            "text": "table",
                            "score": 0.8,
                            "freq": 858
                        },
                        {
                            "text": "tables",
                            "score": 0.8,
                            "freq": 682
                        },
                        {
                            "text": "tails",
                            "score": 0.8,
                            "freq": 4
                        },
                        {
                            "text": "tabs",
                            "score": 0.75,
                            "freq": 4
                        },
                        {
                            "text": "tools",
                            "score": 0.6,
                            "freq": 176
                        }
                    ]
                }
            ]
        }
    }
    

    and then my UI code knows to present a suggestion to the user so they can make better searches.