Search code examples
javaamazon-web-serviceselasticsearchaws-elasticsearch

Elasticsearch nested sort - mismatch between document and nested object used for sorting


I've been developing a new search API with AWS Elasticsearch (version 6.2) as backend.

Right now, I'm trying to support "sort" options for the API.

My mapping is as follows (unrelated fields not included):

{
  "properties": {
    "id": {
      "type": "text",
      "fields": {
        "raw": {
          "type":  "keyword"
        }
      }
    },
    "description": {
      "type": "text"
    },
    "materialDefinitionProperties": {
      "type": "nested",
      "properties": {
        "id": {
          "type": "text",
          "fields": {
            "raw": {
              "type":  "keyword"
            }
          },
          "analyzer": "case_sensitive_analyzer"
        },
        "value" : {
          "type": "nested",
          "properties": {
            "valueString": {
              "type": "text",
              "fields": {
                "raw": {
                  "type":  "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

I'm attempting to allow the users sort by property value (path: materialDefinitionProperties.value.valueLong.raw).

Note that it's inside 2 levels of nested objects (materialDefinitionProperties and materialDefinitionProperties.value are nested objects).

To sort the results by the value of property with ID "PART NUMBER", my request for sorting is:

{
    "fieldName": "materialDefinitionProperties.value.valueString.raw",
    "nestedSort": {
        "path": "materialDefinitionProperties",
        "filter": {
            "fieldName": "materialDefinitionProperties.id",
            "value": "PART NUMBER",
            "slop": 0,
            "boost": 1
        },
        "nestedSort": {
            "path": "materialDefinitionProperties.value"
        }
    },
    "order": "ASC"
}

However, as I examined the response, the "sort" field does not match with document's property value:

{
    "_index": "material-definition-index-v2",
    "_type": "default",
    "_id": "development_LITL4ZCNE",
    "_source": {
        "id": "LITL4ZCNE",
        "description": [
            "CPU, Intel, Cascade Lake, 8259CL, 24C, 210W, B1 Prod"
        ]
        "materialDefinitionProperties": [
            {
                "id": "PART NUMBER",
                "description": [],
                "value": [
                    {
                        "valueString": "202-001193-001",
                        "isOriginal": true
                    }
                ]
            }
        ]
    },
    "sort": [
        "100-000018"
    ]
},

The document's PART NUMBER property is "202-001193-001", the "sort" field says "100-000018", which is the part number of another document.

It seems that there's a mismatch between the master document and nested object used for sorting.

This request worked well when there's only a small number of documents in the cluster. But once I backfill the cluster with ~1 million of records, the symptom appears. I've also tried creating a new ES cluster but the results are the same.

Sorting by other non-nested attributes worked well.

Did I misunderstand the concept of nested objects, or misuse the nested sort feature?

Any ideas appreciated!


Solution

  • This is a bug in Elasticsearch. Upgrading to 6.4.0 fixed the issue.

    Issue tracker: https://github.com/elastic/elasticsearch/pull/32204

    Release note: https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-6.4.0.html