Search code examples
solrsolrj

Solr Tokenizer that splits on case change


I am looking for a tokenizer that splits text based on case change. eg. indexing this text: "HandMade" will index hand and made so searching for hand or made will return results.


Solution

  • The WordDelimiterFilterFactory is what you want to use. It allows you to split on case change (as well as things like intra-word delimiters and numbers, depending on the arguments you use). See the docs here: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

    In your case, you should use splitOnCaseChange="1" to get what you want.