Search code examples
elasticsearchopensearch

Elasticsearch Rank based on rarity of a field value


I'd like to know how can I rank lower items, which have fields that are frequently appearing among the results. Say, we have a similar result set:

  "name": "Red T-Shirt"
  "store": "Zara"

  "name": "Yellow T-Shirt"
  "store": "Zara"

  "name": "Red T-Shirt"
  "store": "Bershka"

  "name": "Green T-Shirt"
  "store": "Benetton"

I'd like to rank the documents in such a manner that the documents containing frequently found fields, "store" in this case, are deboosted to appear lower in the results. This is to achieve a bit of variety, so that the search doesn't yield top results from the same store.

In the example above, if I search for "T-Shirt", I want to see one Zara T-Shirt at the top and the rest of Zara T-Shirts should be appearing lower, after all other unique stores.

So far I tried to research for using aggregation buckets for sorting or script sorting, but without success. Is it possible to achieve this inside of the search engine?

Many thanks in advance!


Solution

  • This is possible with a combination of diversified sampler aggregation and top hits aggregation, as learned from the Elastic forum. I don't know what the performance implications are, if used on a high-load production system. Here is a code example, use at your own risk:

    {
      "query": {}, // whatever query
      "size": 0, // since we don't use hits
      "aggs": {
        "my_unbiased_sample": {
          "diversified_sampler": {
            "shard_size": 100,
            "field": "store"
          },
          "aggs": {
            "keywords": {
              "top_hits": {
                "_source": {
                  "includes": [ "name", "store" ]
                },
                "size": 100
              }
            }
          }
        }
      }
    }