I am trying to setup a search engine using Solr (or Lucene) which could have text in both Latin with special chars, (special chars would include Ö or Ç as an example) or Cyrilic chars (examples include Б or б and Ж ж).
Anyway, I am trying to find a solution to allow me to search for words with these charicters in them, but for users who do not have the key on their keyboard...
Example would be (making up words here, hopefully won't offend anyone):
So, how should I go about this? Some theories I have are:
Any ideas? Anyone tried this before?
Take a look at Solr's Analyzers, Tokenizers, and Token Filters which give you a good intro to the type of manipulation you're looking for.