Search code examples
solrsolrjin-placesolr5

in-place updates using solrJ


I am trying to achieve in-place update for documents.

Solr Version - 5.5.2

Schema.xml -

<dynamicField name="store_*" type="int" indexed="false" stored="false" docValues="true"/>
<field name="_version_" type="long" indexed="false" stored="false" docValues="true" multiValued="false"/>

solrconfig.xml -

<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog>
    <str name="dir">${solr.ulog.dir:}</str>
    <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
  </updateLog>
</updateHandler>`

UpdateHandler being used - DirectUpdateHandler2

According to this article, the target field is non-indexed (indexed="false"), non-stored (stored="false"), single valued (multiValued="false") numeric docValues (docValues="true") field.

I am only adding the document using updateHandler.addDoc(addUpdateCommand); and NOT performing commit after the addition of document using - solrClient.commit();

Issue is without commit, the document is not reflecting.

If I used autoSoftCommit and only adds the document, the changes are reflected in index but filterCache is being cleared.

My aim to achieve in-place update without clearing the filterCache.

Can this be achieved?


Solution

  • Short answer: no, you can't both index a document (a partial or in-place update is still an indexing) and have it searchable (or the changes visible) without clearing Solr's caches.

    Long answer: You can index documents and have the caches stay populated (openSearcher=false), but the newly indexed documents will not appear in search results unless you perform a hard or soft commit. To understand why you should understand how Solr/Lucene works:

    1. A Lucene index is represented as a set of segments. Also, each segment is an auto contained index on its own with multiple files per segment. Finally, once writen to disk, segments are mostly immutable.

    2. Each Solr core has a single instance of IndexSearcher to perform the queries. The IndexSearcher has a static view of all the segments that existed when it was created. This view doesn't change for the lifetime of the IndexSearcher and the caches belong to the IndexSearcher.

    3. Whenever you issue a commit a new segment is created. This operation creates a new IndexSearcher to reflect the newly added (or updated) documents. While the new IndexSearcher is being initialised, the old one is still processing requests. Once the new IndexSearcher finishes, the old one if unregistered (destroyed) and the new IndexSearcher starts to serve the query requests.

    So, the filterCache is cleared because it pertains to a new IndexSearcher. But you can use autoWarming: pre-populate the new caches with values from the old cache (see autowarmCount in solrconfig.xml). Take care because warming can impact performance -- basically the new IndexSearcher will re-run a percentage (configurable) of the filter queries using the keys (queries) from the old IndexSearcher cache -- as the IndexSearcher is not ready until the warming finishes.

    See: https://wiki.apache.org/solr/SolrCaching#autowarmCount

    PS: it's usually not advisable to issue a commit for each new document/update due to the reasons above. It's preferable to rely on auto hard and soft commits.