performance elasticsearch spring-data-elasticsearch elasticsearch-high-level-restclient

Elasticsearch Spring Data and Elasticsearch HighLevelClient performance for complex aggregations

I was trying to find benchmarks that compare performance of Elasticsearch Spring Data with Elasticsearch HighLevelClient for search queries with complex nested aggregations before I make one.

But the only thing I found was that if you need CRUD operations it is easier to use spring data and some other features as auto configurations. but none of them was performance related.

I want to know if any of you have used both and tested their performance? Are there any technical reasons that one of them is faster in such queries or not?

Solution

The most important part here is to make sure that you get the correct underlying query. We‘ve recently had a the case where the wrong setting cost us almost 10x performance. Spring Data uses the High Level Rest Client, so I would generally expect no or a small overhead; if the underlying query is the same. The framework differences are probably small enough where I would prioritize development speed and familiarity.

Our mistake was to return the underlying docs in the aggregation, which is a lot more data to send around / (de)serialize and also won‘t use the cache — that made a difference of 400ms vs 40ms for our aggregation (when we hit the cache).

Edit P.J.Meisch (hope, you don't mind @xeraa), no need for an extra answer:

As already stated, Spring Data Elasticsearch uses the Elasticsearch RestHighLevelClient (and later will use the new Elasticsearch client) and to create an aggregation query you need to use the NativeSearchQuery where you build the query using Elasticsearch's query builders. So building the query is the same when using the RestHighLevelClient directly.

As already mentioned by @xeraa, if you just need the aggs and not the query data make sure to not return the source docs, in Spring Data Elasticsearch you do that with NativeSearchQueryBuilder.withMaxResults(0). You then pass the query as ususal to the ElasticsearchOperations.search() method.

Spring Data Elasticsearch does not do any parsing on the returned aggregations, you will have to do the same there as you will with diretly using the client.

So I don't see a point where Spring Data Elasticsearch will contribute to a performance problem.