Search code examples
solrmultilingual

Solr multilingual stemisation


I'm using Solr to index documents like .pdf or .docx. These documents are in french or in english and I want to use the stemisation for both languages.

For exemple, if I search "chevaux" I want to find "cheval" (french) and if I search "raise" I want to find "raising" (english). Is there a way to do this without createting 2 core (one in english and one in french) ?


Solution

  • Have two fields, one with the field definition you want for French, and one with the field definition you want for English. Then use the Language Detection feature to submit the content to the correct field.

    When searching, query the field that has the correct language as the user, or if you don't know, search both - or use language detection to try to do a better guess.

    You can also index the same content into both fields, but my initial guess is that it'll give you weird results down the road, where someone enters a French word, but due to the processing rules for English, you get hit that wouldn't have happened if you only indexed to the correct field.

    By enabling langid.map, you can tell Solr to index the content into fields named fieldname_langcode (where fieldname is picked up from langid.fl).

    langid.map: Enables field name mapping. If true, Solr will map field names for all fields listed in langid.fl.

    You can use langid.map.replace or langid.map.pattern if you want to change the default fieldname_langcode naming, but I'd leave those alone for now.