Search code examples
elasticsearchautocompleteelasticsearch-suggester

Elastic suggest autocompletion : resut not expected


I'm struggling to understand a result I'm getting while using the suggest API.

The goal is that I don't want that this result to be returned.

How to reproduce - here is my mapping :


PUT /movies
{
  "settings": {
    "analysis": {
      "filter": {
        "true_false_filter": {
          "type": "keep",
          "keep_words": [
            "true",
            "false"
          ]
        },
        "french_elision": {
          "type": "elision",
          "articles_case": false,
          "articles": [
            "puisqu"
          ]
        },
        "french_stemmer": {
          "type": "stemmer",
          "language": "light_french"
        },
        "organic-dictionary": {
          "type": "synonym",
          "expand": true,
          "lenient": true,
          "synonyms": [
            "non bio"
          ]
        },
        "french_stop_filter": {
          "type": "stop",
          "ignore_case": true,
          "stopwords": "_french_"
        }
      },
      "analyzer": {
        "lowercase_stop_analyzer": {
          "tokenizer": "lowercase",
          "filter": [
            "french_stop_filter"
          ]
        },
        "lowercase_asciifolding": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "asciifolding",
            "lowercase"
          ]
        },
        "french_analyzer_custom": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "asciifolding",
            "lowercase",
            "french_elision",
            "french_stemmer"
          ]
        },
        "custom_organic_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "asciifolding",
            "lowercase",
            "french_elision",
            "organic-dictionary",
            "true_false_filter",
            "unique"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "attr": {
        "type": "text",
        "analyzer": "french_analyzer_custom"
      },
      "brand_name": {
        "type": "keyword"
      },
      "brand_name_suggest": {
        "type": "completion",
        "analyzer": "lowercase_stop_analyzer",
        "search_analyzer": "lowercase_asciifolding",
        "preserve_separators": false,
        "preserve_position_increments": false,
        "max_input_length": 50
      }
    }
  }
}

Then I put a document in the index:

POST /movies/_doc/1001
{
    "brand_name": "A LE MOUTON HUILE D'OLIVE",
    "brand_name_suggest": [
      "A LE MOUTON HUILE D'OLIVE"
    ]
}

Then my search :

GET movies/_search
{
  "explain": true, 
  "suggest": {
    "completer": {
      "text": "amo",
      "completion": {
        "field": "brand_name_suggest",
        "size": 20,
        "skip_duplicates": true
      }
    }
  }
}

My issue : why is this document found while searching for "amo"?

And how to prevent it to be returned ?

Thanks in advance


Solution

  • Since the brand_name_suggest uses the lowercase_stop_analyzer which removes French stop words, A LE MOUTON HUILE D'OLIVE would be analyzed as a, mouton, huile, olive, i.e. LE is getting removed.

    So at search time, when you type amo, it matches the first two tokens, hence why you're getting this document. If you want to prevent this, you need to remove the french_stop_filter from your index-time analyzer.

    Besides another issue that might come to bug you later is that your search analyzer lowercase_asciifolding does asciifolding but your index-time analyzer doesn't, so if you index words with accent, you might not find them at search time either.