Let's say i have a Subject entity with a list of Message entities. I want to be able to retrieve all the "Subjects" that have "Messages" that contain several words (like "elasticsearch" and "data") in their "body" attribute.
I'm using spring data elasticsearch for creating the NativeSearchQuery.
One approach would be to NOT have the fielddata enabled and do several regexp queries like this
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
for (String word : wordsToSearchFor) {
queryBuilder.filter(QueryBuilders.regexpQuery(message.body, ".*" + word.toLowerCase() + ".*"));
}
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder).build()
Another approach would be to have the fielddata enabled and do a single query, like this
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchQuery("message.body", "elasticsearch data"))
.build();
but from what i've read on the official elastic site, this approach is not encouraged because of high heap memory usage and low hits time: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
Which approach would be better in this case ?
Not sure what is your use-case, the simple full-text search doesn’t require to enable the fielddata
on text field(disabled by default) and as you already know it's costly not recommended and instead you should have the .keyword
field for your text
field and do the sorting, aggregation on that field.
By looking at your search query, you are not doing the sorting, aggregations hence you shouldn't use the field data. Also, regex queries are expensive and if you can provide your use-case we can offer you a better way to build those queries.