Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dslelasticsearch-queryelasticsearch-5

Elasticsearch Aggregation: Sum of latest childrens per parent


Having a parent-children structure in Elasticsearch representing an order with order_revision children I want to generate a histogram of the price showing the sum of the quantity.

{
  "_type": "order",
  "_id": "1063220887",
  "_score": 1,
  "_source": {
    "order_id": "1063220887",
    "product_id": "10446350",
    "timestamp": 1462713302000
  }
}

{
  "_type": "order_revision",
  "_id": "10234234",
  "_parent": "1063220887",
  "_source": {
    "price": 9,
    "quantity": 3,
    "revision": 361,
    "timestamp": 1462712196000
  }
}

The following aggregation basically works but returns the sum of all exisiting revisions.

  {
    "aggs": {
      "orders": {
        "filter": {
          "has_parent": {
            "parent_type": "order"
          }
        },
        "aggs": {
          "quantity_per_price": {
            "histogram": {
              "field": "price",
              "interval": 1
            }
            "aggs": {
              "sum": {"field": quantity"}
            }
          }
        }
      }
    }
  }

In the final version it should only return the sum of the quantity fields for the latest revision (with the highest/newest timestamp) of every order. I'm not entirely sure how to come up with such an aggregation that does grouping by order_id and only selects the latest child, neither I'm sure if this parent-child structure is the best to model this data.


Solution

  • The easiest implementation would be to have the latest revision tagged ("latest": true) in the document. Then it becomes a simple issue of adding a query or a filter aggregation to filter only the latest revisions.