Search code examples
elasticsearchelasticsearch-queryelasticsearch-analyzers

elasticsearch ignore accents on search


I have an elasticsearch index with customer informations

I have some issues looking for some results with accents

for example, I have {name: 'anais'} and {name: anaïs}

Running

GET /my-index/_search
{
  "size": 25, 
  "query": {
    "match": {"name": "anaïs"}
  }
}

I would like to get both same for this query, in this case I only have anaïs

GET /my-index/_search
{
  "size": 25, 
  "query": {
    "match": {"name": "anais"}
  }
}

I would like to get anais and anaïs, in this case I only have anais

I tried adding an analyser

PUT /my-new-celebrity/_settings
{
  "analysis": {
    "analyzer": {
      "default": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

But in this case for both search I only get anais


Solution

  • Looks like you forgot to apply your custom default analyzer on your name field, below is working example:

    Index def with mapping and setting

    {
        "settings": {
            "analysis": {
                "analyzer": {
                    "default": {
                        "type": "custom",
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "asciifolding"
                        ]
                    }
                }
            }
        },
        "mappings" : {
            "properties" :{
                "name" : {
                    "type" : "text",
                    "analyzer" : "default" // note this 
                }
            }
        }
    }
    

    Index sample docs

    {
       "name" : "anais"
    }
    
    {
       "name" : "anaïs"
    }
    

    Search query same as yours

    {
        "size": 25,
        "query": {
            "match": {
                "name": "anaïs"
            }
        }
    }
    

    And expected both search results

     "hits": [
                {
                    "_index": "myindexascii",
                    "_type": "_doc",
                    "_id": "1",
                    "_score": 0.18232156,
                    "_source": {
                        "name": "anaïs"
                    }
                },
                {
                    "_index": "myindexascii",
                    "_type": "_doc",
                    "_id": "2",
                    "_score": 0.18232156,
                    "_source": {
                        "name": "anais"
                    }
                }
            ]