Search code examples
elasticsearchelasticsearch-aggregation

Filtering, sorting and paginating by sub-aggregations in ElasticSearch 6


I have a collection of documents, where each document indicates the available rooms for a given hotel and day, and their cost for that day:

{
    "hotel_id": 2016021519381313,
    "day": "20200530",
    "rooms": [
        {
            "room_id": "00d70230ca0142a6874358919336e53f",
            "rate": 87
        },
        {
            "room_id": "675a5ec187274a45ae7a5fdc20f72201",
            "rate": 53
        }
    ]
}

Being the mapping:

{
    "properties": {
        "day": {
            "type": "keyword"
        },
        "hotel_id": {
            "type": "long"
        },
        "rooms": {
            "type": "nested",
            "properties": {
                "rate": {
                    "type": "long"
                },
                "room_id": {
                    "type": "keyword"
                }
            }
        }
    }
}

I am trying to figure out, how to do a query where I can get the available rooms for a set of days which total cost is less than a given amount, ordered by total cost in ascending order and paginated.

So far I came up with the way of getting rooms available for the set of days and their total cost. Basically filtering by the days, and grouping per hotel and room IDs, requiring that the minimum count in the aggregation is the number of days I am looking for.

{
    "size" : 0,
    "query": {
        "bool": { 
            "must": [
                {
                    "terms" : {
                        "day" : ["20200423", "20200424", "20200425"]
                    }
                }
            ]
        } 
    } ,
    "aggs" : {
        "hotel" : {
            "terms" : { 
                "field" : "hotel_id"
            },
            "aggs" : {
                "rooms" : {
                    "nested" : {
                        "path" : "rooms"
                    },
                    "aggs" : {
                        "rooms" : {
                            "terms" : {
                                "field" : "rooms.room_id",
                                "min_doc_count" : 3
                            },
                            "aggs" : {
                                "sum_price" : { 
                                    "sum" : { "field" : "rooms.rate" } }
                            }
                        }

                    }
                }
            }
        }
    }
}

So now I am interesting in ordering the result buckets in descending order at the "hotel" level based on the value of the sub-aggregation with "rooms", and also filtering the buckets that do not contains enough documents or which "sum_price" is bigger than a given budget. But I cannot manage how to do it.

I have been taking a look at "bucket_sort", but I cannot find the way to sort in base a subaggregation. I have been also taking a look to "bucket_selector", but it gives me empty buckets when they do not fit the predicate. I am probably not using them correctly in my case.

Which would be the right way of accomplish it?


Solution

  • Here is the query without pagination:

    {
       "size":0,
       "query":{
          "bool":{
             "must":[
                {
                   "terms":{
                      "day":[
                         "20200530",
                         "20200531",
                         "20200532"
                      ]
                   }
                }
             ]
          }
       },
       "aggs":{
          "rooms":{
             "nested":{
                "path":"rooms"
             },
             "aggs":{
                "rooms":{
                   "terms":{
                      "field":"rooms.room_id",
                      "min_doc_count":3,
                      "order":{
                         "sum_price":"asc"
                      }
                   },
                   "aggs":{
                      "sum_price":{
                         "sum":{
                            "field":"rooms.rate"
                         }
                      },
                      "max_price":{
                         "bucket_selector":{
                            "buckets_path":{
                               "var1":"sum_price"
                            },
                            "script":"params.var1 < 100"
                         }
                      }
                   }
                }
             }
          }
       }
    }
    

    Please note that the following variables should be changed for the desired results:

    • day
    • min_doc_count
    • script in max_price