Search code examples
javasearchlucenesearch-engine

Lucene: How to perform search on several independent index sets and merge the result?


Now I have several Lucene index sets (I call it shards), which indexes different document sets. They are independent, which means I can perform search on each of them without reading others. Then I get a query request. I want to search it over every index set and combine the result to form the final top documents.

I know that when scoring the documents, Lucene needs to know the <idf> of every term, and different index sets will give different <idf> to the same term (because different index sets hold different document sets). Thus to my understanding, I cannot compare the document score from different index sets directly. Then how should I generate the final result?

An obvious solution would be first merge the index and then perform the search over the big index. However, this is tooo time-consuming for me and thus unacceptable. Anyone has other better solutions?

P.S.: I don't want to use any packages or softwares (like Katta) except Lucene and Hadoop.


Solution

  • I think MultiReader is what you are looking for. If you have multiple IndexReaders, say reader1 and reader2:

    MultiReader multiReader = new MultiReader(reader1, reader2);
    IndexSearcher searcher = new IndexSearcher(multiReader);