I'm struggling to understand a result I'm getting while using the suggest API.
The goal is that I don't want that this result to be returned.
How to reproduce - here is my mapping :
PUT /movies
{
"settings": {
"analysis": {
"filter": {
"true_false_filter": {
"type": "keep",
"keep_words": [
"true",
"false"
]
},
"french_elision": {
"type": "elision",
"articles_case": false,
"articles": [
"puisqu"
]
},
"french_stemmer": {
"type": "stemmer",
"language": "light_french"
},
"organic-dictionary": {
"type": "synonym",
"expand": true,
"lenient": true,
"synonyms": [
"non bio"
]
},
"french_stop_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": "_french_"
}
},
"analyzer": {
"lowercase_stop_analyzer": {
"tokenizer": "lowercase",
"filter": [
"french_stop_filter"
]
},
"lowercase_asciifolding": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase"
]
},
"french_analyzer_custom": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"french_elision",
"french_stemmer"
]
},
"custom_organic_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"french_elision",
"organic-dictionary",
"true_false_filter",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"attr": {
"type": "text",
"analyzer": "french_analyzer_custom"
},
"brand_name": {
"type": "keyword"
},
"brand_name_suggest": {
"type": "completion",
"analyzer": "lowercase_stop_analyzer",
"search_analyzer": "lowercase_asciifolding",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
Then I put a document in the index:
POST /movies/_doc/1001
{
"brand_name": "A LE MOUTON HUILE D'OLIVE",
"brand_name_suggest": [
"A LE MOUTON HUILE D'OLIVE"
]
}
Then my search :
GET movies/_search
{
"explain": true,
"suggest": {
"completer": {
"text": "amo",
"completion": {
"field": "brand_name_suggest",
"size": 20,
"skip_duplicates": true
}
}
}
}
My issue : why is this document found while searching for "amo"?
And how to prevent it to be returned ?
Thanks in advance
Since the brand_name_suggest
uses the lowercase_stop_analyzer
which removes French stop words, A LE MOUTON HUILE D'OLIVE
would be analyzed as a, mouton, huile, olive
, i.e. LE
is getting removed.
So at search time, when you type amo
, it matches the first two tokens, hence why you're getting this document. If you want to prevent this, you need to remove the french_stop_filter
from your index-time analyzer.
Besides another issue that might come to bug you later is that your search analyzer lowercase_asciifolding
does asciifolding but your index-time analyzer doesn't, so if you index words with accent, you might not find them at search time either.