Search code examples
elasticsearchelasticsearch-analyzers

Elasticsearch analyzer doesn't replace the apostophes (')


Using Elasticsearch v7.0
This is the analyzer I've implemented (http://phoenyx2:9200/search_dev/_settings?pretty=true):

{
    "search_dev": {
        "settings": {
            "index": {
                "refresh_interval": "30s",
                "number_of_shards": "1",
                "provided_name": "search_dev",
                "creation_date": "1558444846417",
                "analysis": {
                    "analyzer": {
                        "my_standard": {
                            "filter": [
                                "lowercase"
                            ],
                            "char_filter": [
                                "my_char_filter"
                            ],
                            "tokenizer": "standard"
                        }
                    },
                    "char_filter": {
                        "my_char_filter": {
                            "type": "mapping",
                            "mappings": [
                                "' => "
                            ]
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "hYz0ZlWFTDKearW1rpx8lw",
                "version": {
                    "created": "7000099"
                }
            }
        }
    }
}

I've recreated the whole index, and there is still no change in the analasis.
I've also run this : url (phoenyx2:9200/search_dev/_analyze)

{
    "analyzer":"my_standard",
    "field":"stakeholderName",
    "text": "test't"
}

Reply was:

{
    "tokens": [
        {
            "token": "test't",
            "start_offset": 0,
            "end_offset": 6,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

I was hoping the the returned token would be "testt"


Solution

  • When you re-create an index its not enough to define a new analyzer in the setting.

    You also have to specify in the mapping which fields use what analyzer, for example:

       "mappings":{
           "properties":{
              "stakeholderName": {
                 "type":"text",
                 "analyzer":"my_analyzer", 
             },
          }
       }
    

    You're mapping (probably) looks like:

       "mappings":{
           "properties":{
              "stakeholderName": {
                 "type":"text",
             },
          }
       }
    

    Basicaly if you run your "analyze" test again and drop the field:

    {
        "analyzer":"my_standard",
        "text": "test't"
    }
    

    You'll get:

    {
      "token": "testt",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    }
    

    As you expect it, so bad news buddy but you have to re-index all your data again and this time specify in the mapping which analyzer you want to be used for each field, otherwise elastic will default to their standard analyzer every time.