I'm having a scenario where i have to build multilingual index. specially for two scripts , these two scripts are totally different (Hindi and English). so their stemmers and lemmatisers dont affect each other. My indexing will be huge containing millions of documents. from follwing 3 which approach do i use for indexing?? :
Single field for two languages. advantage - a) as scripts are different i can use both analysers on it. b) faster searching because fields will be limited. c) will need to take care of relevancy issue.
Language specific fields : a) possibly slower searching because of many fields.
multicore approach : a) problem in handling multilingual docs. b) administration will be hard. c) language specific search will be easy.
I suggest separate cores. IMHO, it's simply the right way to go.
You don't have to use Solr's automatic language recognition, since you define analyzers (lemmatizers/stemmers) for each core/language separately. The only drawback is boilerplate config elements (most settings are the same for both cores).
See this recent, similar post:
Applying Language Specific Analyzer Dynamically before Solr Indexing