Search code examples
lucenefull-text-searchreplicationclucene

Search index replication


I am developing an application that requires a CLucene index to be created in a desktop application, but replicated for (read-only) searching on iOS devices and efficiently updated when the index is updated.

Aside from simply re-downloading the entire index whenever it changes, what are my options here? CLucene does not support replication on its own, but Solr (which is built on top of Lucene) does, so it's clearly possible. Does anybody know how Solr does this and how one would approach implementing similar functionality?

If this is not possible, are there any (non-Java-based) full-text search implementations that would meet my needs better than CLucene?

Querying the desktop application is not an option - the mobile applications must be able to search offline.


Solution

  • A Lucene index is based on write-once read-many segments. This means that when new documents have been committed to a Lucene index, all you nee to retrieve is:

    • the new segments,
    • the merged segments (old segments which have been merged in a single segment, if any),
    • the segments file (which stores information about the current segments).

    Once all these new files have been downloaded, the segments files which have been merged can be safely removed. To take the changes into account, just reopen an IndexReader.

    Solr has a Java implementation to do this, but given how simple it is, using a synchronization tool such as rsync would do the trick too. By the way, this is how Solr replication worked before Solr 1.4, you can still find some documentation on the wiki about rsync replication.