Search code examples
elasticsearchaggregationrecommendation-engineelasticsearch-aggregationsignificant-terms

Exclude Significant Term Aggregation With Different Field


Is it possible to filter the bucket list result of significant term aggregations using multiple fields to be filtered? I am trying to create a recommendation feature using ES based on this article at medium https://towardsdatascience.com/how-to-build-a-recommendation-engine-quick-and-simple-aec8c71a823e.

I store the search data as array of objects instead of array of strings, because i need other fields to be filtered to get correct bucket list result. Here is the index mapping:

{
  "mapping": {
    "properties": {
      "user": {
        "type": "keyword",
        "ignore_above": 256
      },
      "comic_subscribes": {
        "properties": {
          "genres": {
            "type": "keyword",
            "ignore_above": 256
          },
          "id": {
            "type": "keyword",
            "ignore_above": 256
          },
          "type": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

I have 2 conditions to be filtered:

  1. comic_subscribes.type must be "serial" only
  2. comic_subscribes.genre must not in "hentai" or "echii"

I have already tried two methods to apply the conditions. First i tried to filter it using bool query like this:

{
    "size": 0,
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "comic_subscribes.id": "1"
                    }
                }
            ],
            "minimum_should_match": 1,
            "filter": {
                "term": {
                    "comic_subscribes.type": "serial"
                }
            },
            "must_not": [
                {
                    "bool": {
                        "should": [
                            {
                                "term": {
                                    "comic_subscribes.genres": "hentai"
                                }
                            },
                            {
                                "term": {
                                    "comic_subscribes.genres": "echii"
                                }
                            }
                        ],
                        "minimum_should_match": 1
                    }
                }
            ]
        }
    },
    "aggs": {
        "recommendations": {
            "significant_terms": {
                "field": "comic_subscribes.id",
                "exclude": ["1"],
                "min_doc_count": 1,
                "size": 10
            }
        }
    }
}

And filter aggregation method:

{
    "size": 0,
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "comic_subscribes.id": "1"
                    }
                }
            ],
            "minimum_should_match": 1
        }
    },
    "aggs": {
        "filtered": {
            "filter": {
                "bool": {
                    "filter": {
                        "term": {
                            "comic_subscribes.type": "serial"
                        }
                    },
                    "must_not": [
                        {
                            "bool": {
                                "should": [
                                    {
                                        "term": {
                                            "comic_subscribes.genres": "hentai"
                                        }
                                    },
                                    {
                                        "term": {
                                            "comic_subscribes.genres": "echii"
                                        }
                                    }
                                ],
                                "minimum_should_match": 1
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "recommendations": {
                    "significant_terms": {
                        "field": "comic_subscribes.id",
                        "exclude": ["1"],
                        "min_doc_count": 1,
                        "size": 10
                    }
                }
            }
        }
    }
}

But still, both of methods give me unfiltered comic bucket lists. Is it any other way to achieve these required conditions? Should i create one more field which store pre-filtered comic list to be used as source field significant term? Thank you very much.


Solution

  • Ok, bros. I think there is no option method to filter aggregation significant terms bucket list result using different field.

    Based on elasticsearch documentation Significant Terms Aggregation Parameters, which refers to Terms Aggregation Filtering Value. There is no other option than filter using partition expression and filter values with exact values (which i have been using as above, "exclude" param).

    So i create other way around by getting the comic ids which i want to exclude and store it as excludeComics variable in array. Then use the excludeComics var in exclude param. And boom, there you go. Filtered significant terms aggregation bucket list result.