I am using Elasticsearch to index my documents (although I believe my question can apply to any other search engine such as Lucene or Solr as well).
I am using Porter stemmer and a list of stop words at the index time. I know that I should apply the same stemmer and stop word removal at the search time to get correct results.
My question is that what if I decide to change my stemmer or add/remove couple of words to/from the list of stop words? Should I reindex all the documents (or all the text fields) to apply the changes? Or is there any other approach to deal with this situation?
Yes, if you need to change your analyzer significantly you must reindex your documents. If you don't, changes will only affect query analysis. You might be able to get away with that on a change to a StopFilter
, but not when changing a stemmer. Reindexing is the only way to apply new analysis rules to indexed data, whether you reindex by dumping the whole thing and rebuilding it from scratch, or by updating the documents.
As far as other approaches, if you don't want to reindex, you are stuck limiting your analysis changes to query time, which limits what you can do drastically (you could make a SynonymFilter
work, but again, changes to the stemmer are definitely out).