Search code examples
solrsolrjspell-checkingsolrcloud

Difference between IndexBasedSpellChecker and DirectSolrSpellChecker in Solr?


While going through SpellCheck feature in Solr, I found following types of solr SpellChecker

  1. IndexbasedSpellChecker
  2. DirectSolrSpellChecker
  3. FileBasedSpellChecker

What I understood from solr docs definition "The DirectSolrSpellChecker uses terms from the Solr index without building a parallel index like the IndexBasedSpellChecker" is, IndexbasedSpellChecker creates a parallel index and we need to rebuild this parallel index whenever there is a change in base index using which parallel index is built

But in DirectSolrSpellChecker no parallel index will be created so no need to rebuild again and again

My question is if creating parallel index is the only difference between these two spellcheck types, why did solr created new type DirectSolrSpellChecker in Solr4.0 release instead of updating IndexbasedSpellChecker.

Since they have not updated IndexbasedSpellChecker but created new type called DirectSolrSpellChecker my question is :

  1. What is the advantage of building parallel index(as in IndexbasedSpellChecker) and advantage of spell check without building parallel index(as in DirectSolrSpellChecker)

  2. What is the actual difference between IndexbasedSpellChecker and DirectSolrSpellChecker

  3. When should one use IndexbasedSpellChecker and DirectSolrSpellChecker


Solution

  • A part of the answer is in your question (the only difference being one requires its own index, not the other), but I would add :

    • The DirectSolrSpellChecker uses terms from the Solr index, which means it has the benefit of not having to be built regularly because the terms are always kept up-to-date with terms from the main index.

      The drawback is that every changes to the solr index will cost a little more to maintain these terms used by the spellchecker.

    • The IndexbasedSpellChecker on the contrary uses its own index, built from the main index. The advantage here is that you can decide when to commit the changes and rebuild the dictionary.

      Suppose you need a real-time indexing for your users to be able to search and retrieves their documents updated very quickly, which can be very coslty in terms of performance. In this case, having a separate index for spellchecking allows you to prevent updating the spellcheck dictionary every time the main index changes (by setting buildOnCommit=false), ie. you can trigger the rebuild on schedule or manually. You can still set buildOnCommit=true to rebuild the spellcheck index at every commit.

      The drawback is that it requires more space.