What are the differences between the following implementations of SolrServer:
ConcurrentUpdateSolrServer
HttpSolrServer
CommonsHttpSolrServer
(Note: Is this now deprecated?)As mentioned in the documentation:
It is only recommended to use ConcurrentUpdateSolrServer with /update requests. The class HttpSolrServer is better suited for the query interface.
The documentation for ConcurrentUpdateSolrServer suggests using it for updates and HttpSolrServer for queries. Why is this?
At the moment I am using HttpSolrServer
for everything, will using ConcurrentUpdateSolrServer
for updates result in significant performance improvements?
We are currently in 2017, and Solr community renamed SolrServer
into SolrClient and currently we have 4 implementations:
CloudSolrClient
ConcurrentUpdateSolrClient
HttpSolrClient
LBHttpSolrClient
Documentation suggests to use ConcurrentUpdateSolrClient
, because it buffers all update requests into final BlockingQueue<Update> queue;
, so operation time on updates will be less than using HttpSolrClient
, which behaves like this - as soon as it gets update request it immediately fires it. Of course, we are trusting the documentation, but it will be so easy to get this answer, that's why I did some perf testing.
However, first I will describe the different operations of the clients. If you're using add
operation of the SolrClient, there is no difference if you gonna create HttpSolrClient
or ConcurrentUpdateSolrClient
, cause both methods will do the same. ConcurrentUpdateSolrClient
only shines if you're explicitily doing UpdateRequest
Test results for indexing wikipedia titles (code): My machine is: Intel i5-4670S 3.1 Ghz 16 Gb RAM
ConcurrentUpdateSolrClient (5 threads, 1000 queue size) - 200 seconds
ConcurrentUpdateSolrClient (5 threads, 10000 queue size) - 150 seconds
ConcurrentUpdateSolrClient (10 threads, 1000 queue size) - 100 seconds
ConcurrentUpdateSolrClient (10 threads, 10000 queue size) - 30 seconds
HttpSolrClient (no bulk) - 7000 seconds
HttpSolrClient (bulk 1000 docs) - 150 seconds
HttpSolrClient (bulk 10000 docs) - 80 seconds
Summary:
If you're using clients in similar fashion, e.g: client.add(doc);
than, ConcurrentUpdateSolrClient
performing at least 10-20 times faster, because of the usage of ThreadPool and Queue (aka Bulk operation)
If you're using HttpSolrClient
, you still could mimic this behaviour, by manually creating several clients, running additional threads and using some intermediate storage, like List. It will improve the performance for sure, but requires additional code.
Numbers most likely have very little sense, but I hope it gives some raw comparison.