Search code examples
elasticsearche-commercefilteringfaceted-search

Aggregations in a filter only returning values if present in all filtered documents?


For an e-commerce filtering system, aggregations in ElasticSearch have to be calculated over the entire filtered set.

No filters, we get:

"filters":{  
    "colour":{  
        "red":{  
            "count":5
        },
        "blue":{  
            "count":4
        }
    },
    "size":{  
        "L":{  
            "count":16
        },
        "M":{  
            "count":15
        }
    }
}

Filter on red only, we get:

"filters":{  
    "colour":{  
        "red":{  
            "count":5
        },
        "blue":{  
            "count":3
        }
    },
    "size":{  
        "L":{  
            "count":1
        },
        "M":{  
            "count":1
        }
    }
}

Filter on blue only, we get:

"filters":{  
    "colour":{  
        "blue":{  
            "count":4
        },
        "red":{  
            "count":3
        }
    }
}

There is no size for blue, and that's OK. But, if we combine the two:

"filters":{  
    "colour":{  
        "red":{  
            "count":5
        },
        "blue":{  
            "count":4
        }
    },
    "size":{  
        "L":{  
            "count":1
        },
        "M":{  
            "count":1
        }

Size shouldn't be returned at all, because it doesn't match both colours.

This is what is being sent to ES:

{  
    "size":1000,
    "fields":[  
        "id",
        "name",
        "price",
        "colour",
        "size"
    ],
    "query":{  
        "filtered":{  
            "query":{  
                "match_all":{}
            },
            "filter":{  
                "bool":{  
                    "must":[  
                        {  
                            "term":{  
                                "categories":4838
                            }
                        },
                        {  
                            "bool":{  
                                "should":[  
                                    {  
                                        "term":{  
                                            "colour":"blue"
                                        }
                                    },
                                    {  
                                        "term":{  
                                            "colour":"red"
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs":{  
        "price":{  
            "stats":{  
                "field":"price"
            }
        },
        "discounted":{  
            "terms":{  
                "field":"discounted"
            }
        },
        "stock":{  
            "filter":{  
                "range":{  
                    "stock":{  
                        "gt":0
                    }
                }
            }
        },
        "colour":{  
            "terms":{  
                "field":"colour"
            }
        },
        "size":{  
            "terms":{  
                "field":"size"
            }
        }
    }
}

What is going wrong? How do you set to aggregate only on the filtered subset where it matches all documents?


Solution

  • The result is coherent with the query

    {
      "bool": {
        "must": [{
          "term": {
            "categories": 4838
          }
        }, {
          "bool": {
            "should": [{
              "term": {
                "colour": "blue"
              }
            }, {
              "term": {
                "colour": "red"
              }
            }]
          }
        }]
      }
    }
    

    You're asking for items of category 4838 which are blue OR red. When you perform an aggregation on size, buckets are created for each sizes in your dataset, which contains both blue and red items of category 4838. That's why you get the size of red items in the result.

    How do you set to aggregate only on the filtered subset where it matches all documents?

    I'm not sure to understand your question, since your filtered subset contains both red and blue items. You could use sub-aggregations (https://www.elastic.co/blog/intro-to-aggregations-pt-2-sub-aggregations) to have sizes by color in your result

    UPDATE 1:

    If blue and red are selected, and M is only for blue products, but L is for both red and blue, I want to only return L

    So you want size to be dependent on color

    "aggs": {
      "price": {
        "stats": {
          "field": "price"
        }
      },
      "discounted": {
        "terms": {
          "field": "discounted"
        }
      },
      "stock": {
        "filter": {
          "range": {
            "stock": {
              "gt": 0
            }
          }
        }
      },
      "colour": {
        "terms": {
          "field": "colour"
        },
        "aggs": {
          "size": {
            "terms": {
              "field": "size"
            }
          }
        }
      }
    }
    

    You can use a sub-aggregation to do so (https://www.elastic.co/blog/intro-to-aggregations-pt-2-sub-aggregations)

    UPDATE 2:

    Final answer after discussion, keep results of OR query, but keep only result of AND query in aggregation

    {
      "aggs": {
        "colorsizes": {
          "filters": {
            "filters": [{
              "bool": {
                "must": [{
                  "term": {
                    "size": "red"
                  }
                }, {
                  "term": {
                    "size": "blue"
                  }
                }]
              }
            }]
          },
          "aggs": {
            "size": {
              "terms": {
                "field": "size"
              }
            }
          }
        }
      }
    }