Search code examples
elasticsearchelasticsearch-2.0

Elasticsearch - how do I remove s from end of words


Using Elasticsearch 2.2, as a simple experiment, I want to remove the last character from any word that ends with the lowercase character "s". For example, the word "sounds" would be indexed as "sound".

I'm defining my analyzer like this:

{
  "template": "document-index-template",
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "sFilter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z]+)([s]( |$))",
          "replacement": "$2"
        }
      },
      "analyzer": {
        "tight": {
          "type": "standard",
          "filter": [
            "sFilter",
            "lowercase"
          ]
        }
      }
    }
  }
}

Then when I analyze the term "sounds of silences" using this request:

<index>/_analyze?analyzer=tight&text=sounds%20of%20silences

I get:

{
   "tokens": [
      {
         "token": "sounds",
         "start_offset": 0,
         "end_offset": 6,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "of",
         "start_offset": 7,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "silences",
         "start_offset": 10,
         "end_offset": 18,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

I am expecting "sounds" to be "sound" and "silences" to be "silence"


Solution

  • The above analyzer setting is invalid .I think what you intended to use is an analyzer of type custom with tokenizer set to standard

    Example:

    {
     
      "settings": {
        "number_of_shards": 1,
        "analysis": {
          "filter": {
            "sFilter": {
              "type": "pattern_replace",
              "pattern": "([a-zA-Z]+)s$",
              "replacement": "$1"
            }
          },
          "analyzer": {
            "tight": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "sFilter"
              ]
            }
          }
        }
      }
    }