Search code examples
elasticsearchnest

What is the replacement for FielddataLoading.Eager option in Elasticsearch mapping?


I am upgrading an app from Elasticsearch 2.3 to 7.9. I'm using the NEST client version 7.11.1 which shows to be compatible with ES 7.9. We are using 7.9 because that is the latest version version available on AWS server we are working with.

The old application has the following field mapping:

.String(s => s
    .Name(f => f.PartDescription)
    .Analyzer(Analyzers.DescriptionAnalyzer)
    .Fielddata(descriptor => descriptor.Loading(FielddataLoading.Eager)));

I am using the following mapping to replace this in the new version:

.Text(t => t
    .Name(ep => ep.PartDescription)
    .Analyzer(Analyzer.DescriptionAnalyzer)
    .Fielddata(true))

I see that in the new version the only option for Fielddata is a boolean. The Eager and other options are missing.

Is Fielddata(true) a suitable equivalent for the upgrade?


Solution

  • The boolean on fielddata determines whether fielddata is enabled for the field. fielddata is used when performing aggregations, sorting and for scripting, and is loaded into the heap, into the fielddata cache, on demand (not eagerly loaded).

    Typically for text datatype fields, you don't want fielddata; text data types undergo analysis and the resulting tokens are stored in the inverted index. When fielddata is set to true, the inverted index is uninverted on demand to produce a columnar structure that is loaded into the heap to serve aggregations, sorting and scripting on text fields. Text analysis often produces many tokens that serve the purpose of full-text search well but don't serve the purpose of aggregation, sorting and scripting well. With many tokens and many concurrent aggregations, heap memory can grow quickly, exerting GC pressure. So, the default for text datatype fields is to have fielddata be false, and to set it to true if you know what you're doing.

    Instead of setting fielddata to true on a text datatype field, a good approach is to use multi-fields and also map the field as a keyword datatype if the field is one that you want to use for aggregations, sorting and scripting, and target the keyword multi field for this purpose.