Search code examples
elasticsearchspring-data-elasticsearch

multiple regexpQuery vs single matchQuery with fielddata enabled


Let's say i have a Subject entity with a list of Message entities. I want to be able to retrieve all the "Subjects" that have "Messages" that contain several words (like "elasticsearch" and "data") in their "body" attribute.

I'm using spring data elasticsearch for creating the NativeSearchQuery.

One approach would be to NOT have the fielddata enabled and do several regexp queries like this

BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
for (String word : wordsToSearchFor) {
    queryBuilder.filter(QueryBuilders.regexpQuery(message.body, ".*" + word.toLowerCase() + ".*"));
}
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(queryBuilder).build()

Another approach would be to have the fielddata enabled and do a single query, like this

NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
    .withQuery(matchQuery("message.body", "elasticsearch data"))
    .build();

but from what i've read on the official elastic site, this approach is not encouraged because of high heap memory usage and low hits time: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

Which approach would be better in this case ?


Solution

  • Not sure what is your use-case, the simple full-text search doesn’t require to enable the fielddata on text field(disabled by default) and as you already know it's costly not recommended and instead you should have the .keyword field for your text field and do the sorting, aggregation on that field.

    By looking at your search query, you are not doing the sorting, aggregations hence you shouldn't use the field data. Also, regex queries are expensive and if you can provide your use-case we can offer you a better way to build those queries.