Search code examples
elasticsearchelasticsearch-painless

Elasticsearch - bring documents with distinct values to the top of results


So lets say for example I have a 'books' index and each book has an author_id. Because there's only a few authors, author ids will repeat frequently across the books. Books in my index would look something like this:

{
    "title": "Elasticsearch for dummies",
    "author_id": 1,
    "purchases": 10
},
{
    "title": "Great book",
    "author_id": 1,
    "purchases": 5
},
{
    "title": "Great book 2",
    "author_id": 1,
    "purchases": 8
},
{
    "title": "My cool book",
    "author_id": 2,
    "purchases": 14
},
{
    "title": "Interesting book title",
    "author_id": 2,
    "purchases": 20
},
{
    "title": "amazing book",
    "author_id": 2,
    "purchases": 16
},
{
    "title": "Silly Walks vol II",
    "author_id": 3,
    "purchases": 13
},
{
    "title": "Wild animals you can pet",
    "author_id": 3,
    "purchases": 5
},
{
    "title": "GoT Spoilers",
    "author_id": 3,
    "purchases": 4
}

Imagine there are thousands of books and only 50 authors. If I sort only by purchases, I'll get a results page which shows books from only one or two authors. What I need is to have as many authors as possible represented in the results. Is there some combination of function_score + script_score I can use to achieve this? I tried experimenting with Math.exp in a painless script but to no avail.


Solution

  • So I ended up using Field Collapsing which basically allows you to make a regular query and 'collapse' the results based on a particular field. So instead of having each of your results one after the other, you have the top result for each distinct value in that field. You can then use inner_hits to get a list of n posts for each distinct value and you can use from/size to paginate each group.