Search code examples
sortingdatetimeelasticsearchnestedelasticsearch-painless

Elasticsearch script: can a nested date be accessed directly as a date object?


In my ES data, there are nested and parent date fields. I need to use these dates in one ES painless script. It is easy to operate with the parent date as it is treated as a time object. For example, to transform parent dates doc['validFrom'] to UNIX-time number I just use:

doc['validFrom'].value.millis

But it is a different case for operating with nested dates like params._source['offers'][0].validFrom. These dates are returned as a String, not date. So I have to parse them to date object manually:

LocalDateTime.parse(params._source['offers'][0].validFrom), ZoneId.systemDefault()).toInstant().toEpochMilli()

This manual date parsing brings extra complexity to the script. It seems to me that it is also not good for performance. Can a nested date field be accessed as a date object in elasticsearch script directly without parsing from String?

P.S Data example:

[
    {
        "id": "1",
        "rank": 8,
        "validFrom": "1970-01-01T00:00:00"
        "offers": [
            {
                "id": "777",
                "rank": 12,
                "validFrom": "2020-07-06T00:00:00"  // !!! should take the date from here
            }
        ]
    },
    {
        "id": "2",
        "rank": 35,
        "validFrom": "2019-05-03T00:00:00"  // !!! should take the date from here as offers are null
        "offers": null
    }
]

My Script

    "sort": [
        {
            "_script": {
                "script": {
                    "source": "params._source.offers != null ? ZonedDateTime.of(LocalDateTime.parse(params._source['offers'][0].validFrom), ZoneId.systemDefault()).toInstant().toEpochMilli() : doc['validFrom'].value.millis",
                    "lang": "painless"
                },
                "type": "number",
                "order": "asc"
            }
        }
    ]

Solution

  • This question is related to this one.

    The topic here is differentiating between doc_values and _source fields.

    Since doc_values do return primitive types, you're able to access .millis on the date field. But _source itself is a JSON-ish, non-analyzed map-of-maps so you only get what was originally ingested, unfortunately.

    When you're having trouble w/ the performance, I'd recommend extracting the nested validFrom to the top level and call it, say, validFromOverride. Your sort script logic complexity will then significantly reduce.

    Mappings & doc structure don't need to be immutable.