Search code examples
solrlucenesolrcloud

Solr Cloud Data Import Handler slow with replication


I am setting up a Solr Cloud deployment with 3 nodes and 3 shards. Without replication my data import handler imports very quickly- around 1.2M documents in ~5minutes. This is great, however when I enable replication, i.e. re-create the collection with a replication factor of 2, the data import handler becomes significantly slower, taking around 1hr 30mins for the same 1.2M documents.

I am using solr 5.3.1 in cloud mode on 3 4x16 virtual servers with a zookeeper instance on each node. The data import comes from an MS SQL DB.

Most of my configuration is the defaults that come with Solr, I have tried changing the auto commit for hard and soft commits to being very long but no effect.

Any ideas/pointers would be much appreciated.

Thanks, Ewen


Solution

  • Maybe not a proper answer but the issue seemed to resolve itself. Of course we must have done something to make this happen, however all we can think of that we did was removing the CONSOLE logging in the log4j properties file and deleting the 11GB log file it had created.

    Guess this may at least may give something else for others to try who are having the same issue.