I'm pretty sure this has to do with stemming, and I'm not sure what I need to change to get spelling suggestions to return whole words.
Settings are:
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "stop_words", "cm_snow"]
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
}
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
},
"cm_snow": {
"type": "snowball",
"language": "English"
},
"stop_words": {
"type": "stop",
"ignore_case": True,
"stopwords": STOP_WORDS
}
}
}
}
}
If I do the following query to Elasticsearch:
curl -XPOST 'localhost:9200/listing/_suggest' -d '{
"my-suggestion" : {
"text" : "table",
"term" : {
"field" : "text"
}
}
}'
I get back:
{"text":"tabl","offset":0,"length":5,"options":[]}
Why is the result "tabl", even for a correctly-spelled word?
The problem is that I was using the default analyzer, and the default analyzer was using snowball, which was using the snowball index_analyzer, so the words were getting indexes as their stems.
Because we still want to search on stemmed words, I added an extra field to my document call suggest that uses the standard analyzer. Into that, I put a text blob of a bunch of the words of that document (title, description, tags) and mark is as include_in_all=false
Here's its mapping:
"suggest": {
"type": "string",
"analyzer": "standard"
},
and then in my query, I query against _all for the actual search results, but use suggest for the suggestions.
{
"query": {
"match": {
"_all": "tabel"
}
},
"suggest": {
"suggest-0": {
"term": {
"field": "suggest",
"size": 5
},
"text": "tabls"
}
}
}
Which gives:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"suggest": {
"suggest-0": [
{
"text": "tabls",
"offset": 0,
"length": 5,
"options": [
{
"text": "table",
"score": 0.8,
"freq": 858
},
{
"text": "tables",
"score": 0.8,
"freq": 682
},
{
"text": "tails",
"score": 0.8,
"freq": 4
},
{
"text": "tabs",
"score": 0.75,
"freq": 4
},
{
"text": "tools",
"score": 0.6,
"freq": 176
}
]
}
]
}
}
and then my UI code knows to present a suggestion to the user so they can make better searches.