We got the following problem at hand. We want to do a full reindex with 100 % read availability during the process. The problem arises when deleting old documents from the index. At the moment we´re doing sth. like this:
1) fetch all data from db and update solr index per solrServer.add()
2) get all document ids that were updated and compare them with all the document ids in index
3) delete all documents that are in index but weren´t updated
This seems to work but is there maybe a better/easier solution for this task?
The changes do not become visible until you commit. So, you can issue delete and then index all your documents. Just make sure automatic commits are not there. This obviously requires more memory.
Alternatively, you can do a separate field with generational stamp (e.g. increasing ID or timestamp). Then, you issue a query delete to pick up the left over documents with old generation.
Finally, you can index into a new Core/Collection and then swap out the active collection to point to the new one. Then, you can just delete the old collection directory.