Search code examples
sitecorelucene.netsitecore8accent-insensitive

Sitecore lucene greek search is accent sensitive?


On a default Sitecore 8 installation, I have a bucket with quite a few items in it. When I issue a content search query on a RTE field in the Greek language, it seems that Sitecore treats the search term in an accent sensitive way, which is wrong for Greek.

Can someone point me to the right direction into making the index accent insensitive for Greek?


Solution

  • It seems that the issue was with the way Sitecore understands cultures and assigns culture execution context to its searches and indexes.

    For the particular solution, we had renamed the "el-GR" language to "el" (to show "nicely" in the url). In turn, Sitecore was assigning a CultureInfo with name "el", not "el-GR". But in the defaultIndexConfiguration config file, the Greek analyzer would be assigned only when the CultureInfo of the CultureExecutionContext was el-GR, so the data was actually indexed using the StandardAnalyzer, not the GreekAnalyzer, hence the accent sensitivity.

    We added extra config to cover the case where the CultureInfo has name "el" (copied the "el-GR" config node actually) and after the necessary index rebuild, everything was OK.

    It is rather obscure though why Sitecore would go and alter the name of the CultureInfo object...