Search code examples
indexingsolrlucenesolrcloud

Solr indexes documents very late


In some cases when I feed a document to index to Solr, the document does not seem to be indexed, or only indexed very late. This happens with SolrCloud 7.1.0.

My case:
- At 08:59:48.157 I add a document A
- At 08:59:48.264 I add a document B
- At 09:00:19.467 I perform a query and only document A is found.

This happens in an automated test, but does not reproduce often. It works fine (both A and B show up) in ~90% of the cases, and in the other 10% I don't get both documents.

I have configured autoCommit is to 15 seconds (openSearcher=false), autoSoftCommit to 1 second (I know this should be as high as possible, I intend to increase this later).

I assume Solr logs with it auto (soft) commits, and I indeed see logging by a "commitScheduler" thread, of a DirectUpdateHandler2, but it runs only rarely. The first occurrence of the commitScheduler in the log after adding both documents A and B is at 9:00:25, nearly 40 seconds after having added the documents.

I seem to get the same issue when deleting and object from the index. Sometimes it just doesn't happen, or at least very late. I see "delete" in the log and a query fired 50 seconds later still yields the deleted object.

When I compare the logs of a successful run with the logs of an unsuccessful run, I don't see any difference. The SolrCloud logs (of an unsuccessful run):

2018-09-14 08:59:48.144 INFO  (zkCallback-3-thread-4-processing-n:localhost:5100_solr) [   ] o.a.s.s.ZkIndexSchemaReader Finished refreshing schema in 34 ms
2018-09-14 08:59:48.151 INFO  (Thread-80) [   ] o.a.s.s.IndexSchema [Cloud_shard1_replica_n1] Schema name=base-schema
2018-09-14 08:59:48.156 INFO  (Thread-80) [   ] o.a.s.s.IndexSchema Loaded schema base-schema/1.6 with uniqueid field id

2018-09-14 08:59:48.157 INFO  (qtp947679291-17) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.u.p.LogUpdateProcessorFactory [Cloud_shard1_replica_n1]  webapp=/solr path=/update params={wt=javabin&version=2}{add=[5f8ecb57-2135-4c26-a9b3-6808531badd0 (1611572799756828672)]} 0 9

2018-09-14 08:59:48.160 INFO  (Thread-80) [   ] o.a.s.c.CoreContainer Reloading SolrCore 'Cloud_shard1_replica_n1' using configuration from collection Cloud
2018-09-14 08:59:48.160 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.SolrCore [[Cloud_shard1_replica_n1] ] Opening new SolrCore at [/var/app/current/solr/work/Cloud_shard1_replica_n1], dataDir=[/var/app/current/solr/work/Cloud_shard1_replica_n1/data/]
2018-09-14 08:59:48.181 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.h.a.SystemInfoHandler Resolving canonical hostname for local host prevented due to 'solr.dns.prevent.reverse.lookup' sysprop
2018-09-14 08:59:48.186 WARN  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.RequestHandlers no default request handler is registered (either '/select' or 'standard')
2018-09-14 08:59:48.187 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 15000ms; 
2018-09-14 08:59:48.187 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.u.CommitTracker Soft AutoCommit: if uncommited for 1000ms; 
2018-09-14 08:59:48.202 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.s.SolrIndexSearcher Opening [Searcher@3844c2e[Cloud_shard1_replica_n1] main]
2018-09-14 08:59:48.203 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.r.ManagedResourceStorage Configured ZooKeeperStorageIO with znodeBase: /configs/tenant
2018-09-14 08:59:48.203 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.r.ManagedResourceStorage Loaded null at path _rest_managed.json using ZooKeeperStorageIO:path=/configs/tenant
2018-09-14 08:59:48.203 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.s.ZkIndexSchemaReader Creating ZooKeeper watch for the managed schema at /configs/tenant/managed-schema
2018-09-14 08:59:48.204 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.s.ZkIndexSchemaReader Current schema version 66 is already the latest
2018-09-14 08:59:48.204 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.h.ReplicationHandler Commits will be reserved for 10000ms.
2018-09-14 08:59:48.204 INFO  (searcherExecutor-540-thread-1-processing-n:localhost:5100_solr x:Cloud_shard1_replica_n1 s:shard1 c:Cloud r:core_node2) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.SolrCore [Cloud_shard1_replica_n1] Registered new searcher Searcher@3844c2e[Cloud_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.2.0):C1870) Uninverting(_1(7.2.0):C4) Uninverting(_2(7.2.0):C1)))}
2018-09-14 08:59:48.227 INFO  (qtp947679291-15) [   ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params action=LIST&wt=javabin&version=2 and sendToOCPQueue=true
2018-09-14 08:59:48.228 INFO  (qtp947679291-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=LIST&wt=javabin&version=2} status=0 QTime=0
2018-09-14 08:59:48.237 INFO  (qtp947679291-15) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.S.Request [Cloud_shard1_replica_n1]  webapp=/solr path=/schema/fields params={wt=javabin&version=2} status=0 QTime=0
2018-09-14 08:59:48.252 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.u.DefaultSolrCoreState New IndexWriter is ready to be used.

2018-09-14 08:59:48.264 INFO  (qtp947679291-15) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.u.p.LogUpdateProcessorFactory [Cloud_shard1_replica_n1]  webapp=/solr path=/update params={wt=javabin&version=2}{add=[8e209dd4-03ef-4397-8c6a-b947270af684 (1611572799877414912)]} 0 2

2018-09-14 08:59:48.269 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.s.SolrIndexSearcher Opening [Searcher@73887417[Cloud_shard1_replica_n1] main]
2018-09-14 08:59:48.270 INFO  (searcherExecutor-540-thread-1-processing-n:localhost:5100_solr x:Cloud_shard1_replica_n1 s:shard1 c:Cloud r:core_node2) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.SolrCore [Cloud_shard1_replica_n1] Registered new searcher Searcher@73887417[Cloud_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.2.0):C1870) Uninverting(_1(7.2.0):C4) Uninverting(_2(7.2.0):C1)))}
2018-09-14 08:59:48.270 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.SolrCore [Cloud_shard1_replica_n1]  CLOSING SolrCore org.apache.solr.core.SolrCore@2460d222
2018-09-14 08:59:48.271 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.core.Cloud.shard1.replica_n1, tag=610325026
2018-09-14 08:59:48.271 INFO  (Thread-80) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.collection.Cloud.shard1.leader, tag=610325026

2018-09-14 09:00:19.467 INFO  (qtp947679291-18) [c:Cloud s:shard1 r:core_node2 x:Cloud_shard1_replica_n1] o.a.s.c.S.Request [Cloud_shard1_replica_n1]  webapp=/solr path=/query    params={q=*&df=_text_&qt=/query&_stateVer_=Cloud:4&fl=id,_displayName&start=0&sort=_displayName+asc&fq=(MY-FQ)&rows=1000&wt=javabin&version=2} hits=1 status=0 QTime=1

It seems that only the first test fails occasionally. This test is executed right after firing up SolrCloud. I have not yet seen later tests fail.

Any suggestions would be greatly appreciated.


Solution

  • Found out what was wrong thanks to the #solr guys on freenode.

    The problem was that we updated the Solr schema and right after that fed Solr new documents. Solr apparently needs a short break after updating the schema or the autocommit configured does not work correctly (I've been told this may be a Solr bug but I'm not sure). That is, autocommit was triggered but did in some cases not commit the second document, but only the first. Only after another commit (either manual or by adding another document), the second document is actually committed.

    I found two possible fixes:

    • Removing the schema updates.
    • Reload the collection after updating the schema.