Search code examples
elasticsearchtokenize

Elasticsearch - tokenize terms by capitalized character, for example "TheStarTech" => [The, Star, Tech]


Does Elasticsearch support tokenizer to tokenize terms by capitalized character, for example: Tokenize TheStarTech to terms [The, Star, Tech]. Pattern tokenizer seems helpful, any suggestions?


Solution

  • See this: World Delimited Token Filter

    It does what you want and more. You can pass in the parameters as may fit your need. Check split_on_case_change parameter which is true by default.