I've been developing a new search API with AWS Elasticsearch (version 6.2) as backend.
Right now, I'm trying to support "sort" options for the API.
My mapping is as follows (unrelated fields not included):
{
"properties": {
"id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"description": {
"type": "text"
},
"materialDefinitionProperties": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "case_sensitive_analyzer"
},
"value" : {
"type": "nested",
"properties": {
"valueString": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
I'm attempting to allow the users sort by property value (path: materialDefinitionProperties.value.valueLong.raw
).
Note that it's inside 2 levels of nested objects (materialDefinitionProperties and materialDefinitionProperties.value are nested objects).
To sort the results by the value of property with ID "PART NUMBER", my request for sorting is:
{
"fieldName": "materialDefinitionProperties.value.valueString.raw",
"nestedSort": {
"path": "materialDefinitionProperties",
"filter": {
"fieldName": "materialDefinitionProperties.id",
"value": "PART NUMBER",
"slop": 0,
"boost": 1
},
"nestedSort": {
"path": "materialDefinitionProperties.value"
}
},
"order": "ASC"
}
However, as I examined the response, the "sort" field does not match with document's property value:
{
"_index": "material-definition-index-v2",
"_type": "default",
"_id": "development_LITL4ZCNE",
"_source": {
"id": "LITL4ZCNE",
"description": [
"CPU, Intel, Cascade Lake, 8259CL, 24C, 210W, B1 Prod"
]
"materialDefinitionProperties": [
{
"id": "PART NUMBER",
"description": [],
"value": [
{
"valueString": "202-001193-001",
"isOriginal": true
}
]
}
]
},
"sort": [
"100-000018"
]
},
The document's PART NUMBER property is "202-001193-001", the "sort" field says "100-000018", which is the part number of another document.
It seems that there's a mismatch between the master document and nested object used for sorting.
This request worked well when there's only a small number of documents in the cluster. But once I backfill the cluster with ~1 million of records, the symptom appears. I've also tried creating a new ES cluster but the results are the same.
Sorting by other non-nested attributes worked well.
Did I misunderstand the concept of nested objects, or misuse the nested sort feature?
Any ideas appreciated!
This is a bug in Elasticsearch. Upgrading to 6.4.0 fixed the issue.
Issue tracker: https://github.com/elastic/elasticsearch/pull/32204
Release note: https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-6.4.0.html