Search code examples
solrsolrj

SOLR updates lost using concurrentUpdateSolrServer and atomic updates


I have a single SOLR server (not cloud) with auto commit every 15 seconds. After indexing many documents into it, I now want to make changes to some of the fields. Since this change is very big I need to do it with ~40 threads.

I use a single concurrentUpdateSolrServer for all threads. I set this server to flush every 1000 docs and use 48 threads internally. (not my threads).

Since I wanna add values to a multivalued field I used atomic add.

I'm stopping the process after updating ~5000 docs. I call commit + blockUntilFinshed + shutdown before exiting.

When I query the SOLR server - only ~200 documents seems to get the update.

I tried this also with only 1 thread (my thread - still 48 on the update server) and still the same problem.

When I change from concurrentUpdateSolrServer to HttpSolrServer (1 thread) it works fine.


Solution

  • OK solved it:

    The mistake was I had a SolrDocument which I wanted to update - so I converted it to a SolrInputDocumnt:

    SolrInputDocument inputDoc =   
    org.apache.solr.client.solrj.util.ClientUtils.toSolrInputDocument(solrDoc);
    Map<String, String> partialUpdate = new HashMap<String, String>();
    partialUpdateOut.put("add", "newAddedValue");
    inputDoc.addField("fieldName", partialUpdate);
    concurrentServer.add(inputDoc);
    

    But I guess since The SolrDocument had a version data inside - it messed the update.

    The right way is to update only by doc ID like this:

    SolrInputDocument inputDoc = new SolrInputDocument();
    inputDoc.addField("id", solrDoc.getFieldValue("id"));
    Map<String, String> partialUpdate = new HashMap<String, String>();
    partialUpdateOut.put("add", "newAddedValue");
    inputDoc.addField("fieldName", partialUpdate);
    concurrentServer.add(inputDoc);
    

    Thanks!