Search code examples
elasticsearchelasticsearch-painless

Elasticsearch Painless - Group all doucuments by a field, and do calculation on all values


I have 'vehicles' index pattern, every document has a plate number ('plateNo'), 'position' and 'date' fields. As a test I want to group all the documents in this index by the plate number, then sort the documents by the date, and finally calculate the distance every vehicle moved by sum the the absolute differences between every two positions.

for example:

plateNo position date
vehicle 1 1 May 16, 2021 @ 15:55:37
vehicle 2 7 May 16, 2021 @ 15:55:05
vehicle 1 5 May 16, 2021 @ 15:54:30
vehicle 2 10 May 16, 2021 @ 15:53:01
vehicle 1 2 May 16, 2021 @ 15:50:41

The output for must be

plateNo distance
vehicle 1 abs(5 - 2) + abs(1 - 5) = 7
vehicle 2 abs(7 - 10) = 3

How can I do that with Painless? - fast response is important, and the number of documents is very large thanx


Solution

  • Assumption:

    • Data of the same vehicle is present on a single shard
    • Date format defined in mapping is 'epoch_millis'
    • If there is only a single entry of a vehicle then the distance for the same will be zero

    Aggregation Used: scripted_metric

    Steps:

    • Init: Initialized a TreeMap for storing timestamp as key and pos as value
    • Map: Put the value of timestamp & pos in the TreeMap
    • Collect: Get the values and calculate the difference in positions (pos should be returned in the sorted order of timestamp)
    • Reduce: Return the distance

    Sample Query:

      GET vehicle/_search
      {
      "size": 0,
      "aggs": {
        "vehicles": {
          "terms": {
            "field": "vehicle",
            "size": 10
          },
          "aggs": {
            "distance": {
              "scripted_metric": {
                "init_script": "state.dt_point_map=new TreeMap(); state.distance=0; ",
                "map_script": "state.dt_point_map.put(doc.date.value,doc.pos.value);",
                "combine_script": "int i=0;long prev=0; for(p in state.dt_point_map.values()){if(i==0){prev=p;i++;}else{state.distance+=Math.abs(p-prev);prev=p;i++;}} return state.distance;",
                "reduce_script": "double overallDistance = 0; for (distance in states) { overallDistance += distance } return overallDistance;"
              }
            }
          }
        }
      }
      }
    

    Recommendation: Precompute and store the data for fast access.