Search code examples
elasticsearch-dslelasticsearch-dsl-py

How do I create a runtime field with elasticsearch-dsl?


I have been successfully using python with elasticsearch-dsl (https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#) and composite aggregations for some time now but have come across a problem I can't solve. The data I'm querying is asset data for multiple organizations and consists of one row for each asset. Each row includes two dates (scan_date and timestamp) for each asset and I need the search result to give me the maximum of (timestamp - scan_date), i.e., the oldest "scan_age," for each organization. In SQL terms:

SELECT organization, MAX(timestamp - scan_date) as oldest_scan_age
FROM database
GROUP BY organization

The elasticsearch-dsl library includes the function script_fields() to define calculated fields, but from what I've read (and tried) I can't perform aggregations on script fields. The elasticsearch feature I need seems to be "runtime fields" but elasticsearch-dsl does not appear to provide a function I can call to specify a runtime field. If such a function existed, I could call it to define a new field called "scan_age" then find its maximum by agency using a standard composite aggregation.

How can I perform this query using elasticsearch-dsl?

Please note that (1) I already know how to do this using the JSON syntax typically associated with querying elasticsearch. What I'm looking for is how to do it in elasticsearch-dsl. (2) I would happily create my own JSON to define just the runtime field if I had a way to use it programatically to modify my elasticsearch-dsl search object.


Solution

  • There is a pull request to add support to runtime fields, here:

    https://github.com/elastic/elasticsearch-dsl-py/pull/1611/commits

    You can define the runtime field in the mappings, and then just run a regular aggregation using the DSL.

    PUT my-index-000001/
    {
      "mappings": {
        "runtime": {
          "oldest_scan_age": {
            "type": "date",
            "script": {
              "source": "emit(your_field_logic)"
            }
          }
        },
        "properties": {
          "@timestamp": {"type": "date"}
        }
      }
    }
    

    There is an answer that shows how to work with date max:

    https://stackoverflow.com/a/49890932/3112848

    Let me know if it worked or you need more help