Search code examples
datedatetimeelasticsearchelasticsearch-5elasticsearch-painless

How to perform date arithmetic between nested and unnested dates in Elasticsearch?


Consider the following Elasticsearch (v5.4) object (an "award" doc type):

{
  "name": "Gold 1000",
  "date": "2017-06-01T16:43:00.000+00:00",
  "recipient": {
    "name": "James Conroy",
    "date_of_birth": "1991-05-30"
  }
}

The mapping type for both award.date and award.recipient.date_of_birth is "date".

I want to perform a range aggregation to get a list of the age ranges of the recipients of this award ("Under 18", "18-24", "24-30", "30+"), at the time of their award. I tried the following aggregation query:

{
  "size": 0,
  "query": {"match_all": {}},
  "aggs": {
    "recipients": {
      "nested": {
        "path": "recipient"
      },
      "aggs": {
        "age_ranges": {
          "range": {
            "script": {
              "inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
            },
            "keyed": true,
            "ranges": [{
              "key": "Under 18",
              "from": 0,
              "to": 18
            }, {
              "key": "18-24",
              "from": 18,
              "to": 24
            }, {
              "key": "24-30",
              "from": 24,
              "to": 30
            }, {
              "key": "30+",
              "from": 30,
              "to": 100
            }]
          }
        }
      }
    }
  }
}

Problem 1

But I get the following error due to the comparison of dates in the script portion:

Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].

The DateTime object is the award.date field, and the MutableDateTime object is the award.recipient.date_of_birth field. I've tried doing something like doc['recipient.date_of_birth'].date.toDateTime() (which doesn't work despite the Joda docs claiming that MutableDateTime has this method inherited from a parent class). I've also tried doing something further like this:

"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"

Which sadly also doesn't work :(

Problem 2

I notice if I do this:

"aggs": {
  "recipients": {
    "nested": {
      "path": "recipient"
    },
    "aggs": {
      "award_years": {
        "terms": {
          "script": {
            "inline": "doc['date'].date.year"
          }
        }
      }
    }
  }
}

I get 1970 with a doc_count that happens to equal the total number of docs in ES. This leads me to believe that accessing a property outside of the nested object simply does not work and gives me back some default like the epoch datetime. And if I do the opposite (aggregating dates of birth without nesting), I get the exact same thing for all the dates of birth instead (1970, epoch datetime). So how can I compare those two dates?

I am racking my brain here, and I feel like there's some clever solution that is just beyond my current expertise with Elasticsearch. Help!

If you want to set up a quick environment for this to help me out, here is some curl goodness:

curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'

That should give you a "joelinux" index with two "award" docs to test this out ("James Conroy" and "Martin McNealy"). Thanks in advance!


Solution

  • Unfortunately, you can't access nested and non-nested fields within the same context. As a workaround, you can change your mapping to automatically copy date from nested document to root context using copy_to option:

    {
        "mappings": {
            "award": {
                "properties": {
                    "name": {
                        "fields": {
                            "keyword": {
                                "ignore_above": 256,
                                "type": "keyword"
                            }
                        },
                        "type": "text"
                    },
                    "date": {
                        "type": "date"
                    },
                    "date_of_birth": {
                        "type": "date" // will be automatically filled when indexing documents
                    },
                    "recipient": {
                        "properties": {
                            "name": {
                                "fields": {
                                    "keyword": {
                                        "ignore_above": 256,
                                        "type": "keyword"
                                    }
                                },
                                "type": "text"
                            },
                            "date_of_birth": {
                                "type": "date",
                                "copy_to": "date_of_birth" // copy value to root document
                            }
                        },
                        "type": "nested"
                    }
                }
            }
        }
    }
    

    After that you can access date of birth using path date, though the calculations to get number of years between dates are slightly tricky:

    Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()
    

    Here I convert original JodaTime date objects to system.time.LocalDate objects:

    1. Get number of milliseconds from 1970-01-01
    2. Convert to number of days from 1970-01-01 by dividing it to 86400000L (number of ms in one day)
    3. Convert to LocalDate object
    4. Create date-based Period object from two dates
    5. Get number of years between two dates.

    So, the final aggregation query looks like this:

    {
        "size": 0,
        "query": {
            "match_all": {}
        },
        "aggs": {
            "age_ranges": {
                "range": {
                    "script": {
                        "inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
                    },
                    "keyed": true,
                    "ranges": [
                        {
                            "key": "Under 18",
                            "from": 0,
                            "to": 18
                        },
                        {
                            "key": "18-24",
                            "from": 18,
                            "to": 24
                        },
                        {
                            "key": "24-30",
                            "from": 24,
                            "to": 30
                        },
                        {
                            "key": "30+",
                            "from": 30,
                            "to": 100
                        }
                    ]
                }
            }
        }
    }