Search code examples
curlelasticsearchdiacriticselasticsearch-analyzers

Custom analyzer not working in elasticsearch


Running elastic version 1.6

I am trying to set custom analyzer for my index in elasticsearch. My index /has some properties which contains some accents and special characters.

Like one of my property name has value like this, "name" => "Está loca". So what I want to achieve is, whenever I am trying to search by this way, http://localhost:9200/tutorial/helloworld/_search?q=esta

I should get the result for "Está loca". I have gone through following link and configured necessary analyzer which is explain in the link. https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html

curl -XPUT 'localhost:9200/tutorial?pretty' -H 'Content-Type: application/json' -d'
{
"mappings":{
  "helloworld":{
  "properties": {
    "name": { 
      "type":           "string",
      "analyzer":       "standard",
      "fields": {
        "folded": { 
          "type":       "string",
          "analyzer":   "folding"
        }
      }
    }
  }
}
},
"settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter":  [ "lowercase", "asciifolding" ]
        }
      }
    }
  }
}'

I have configured this while creating index and made some entries like this for test,

curl -X POST 'http://localhost:9200/tutorial/helloworld/1' -d '{ "name": "Está loca!" }'
curl -X POST 'http://localhost:9200/tutorial/helloworld/2' -d '{ "name": "Está locá!" }'

but while searching like this, http://localhost:9200/tutorial/helloworld/_search?q=esta nothing is happening. I just want whenever a user searches in any languages for example in English it should get the same result. Please anybody can help, how can I achieve this struggling on it for last 1 week.


Solution

  • you would not be able to search for esta keyword in _all field. As elasticsearch by default only apply standard analyzer while constructing _all field.

    so your following query

    GET folding_index1/helloworld/_search?q=esta
    

    Produces following match query in elastic dsl.

    GET folding_index1/helloworld/_search
    {
      "query": {
        "match": {
          "_all": "esta"
        }
      }
    }
    

    Which search against _all field and hence couldn't find folded token for name.

    You can do following, but even with include_in_all mentioned for multifield, it still applies standard analyzer for _all field.

    PUT folding_index1
    {
        "mappings": {
            "helloworld": {
                "properties": {
                    "name": {
                        "type": "string",
                        "analyzer": "standard",
                        "fields": {
                            "folded": {
                                "type": "string",
                                "analyzer": "folding",
                                "include_in_all": true
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "analysis": {
                "analyzer": {
                    "folding": {
                        "tokenizer": "standard",
                        "filter": ["lowercase", "asciifolding"]
                    }
                }
            }
        }
    }
    

    Query like following can work for you. More on _all field analyzer

    POST folding_index1/_search?q=name.folded:esta