Search code examples
sqlpostgresqlaggregate-functionsgreatest-n-per-group

Get rows with maximum count


I need to perform a query to get the candies that are most liked by kids and here's what I've got:

SELECT COUNT(*) as counts,candies.name
FROM candies 
INNER JOIN kid_candy ON kid_candy.candy_id = candies.id
INNER JOIN kids ON kids.id = kid_candy.kid_id
GROUP BY candies.name

which would return:

counts  | name
--------+---------
  3     | snowbear
  3     | whiterabbit
  2     | lollipop

All I want to see would be just

counts  | name
--------+---------
  3     | snowbear
  3     | whiterabbit

So what should my query be?


Solution

  • Assuming a classical n:m relationship between kids and candies like detailed here:

    You should provide such details in your question.

    SELECT c.id, kc.counts, c.name
    FROM  (
       SELECT candy_id AS id, count(*) AS counts
            , rank() OVER (ORDER BY count(*) DESC) AS rnk
       FROM   kid_candy
       GROUP  BY 1
       ) kc
    JOIN   candies c USING (id)
    WHERE  kc.rnk = 1;
    

    This should be considerably faster than a query with two CTEs and needless joins.

    Major points

    • It is potentially wrong (and more expensive) to group by candies.name. The name might not be unique. Use the primary key column for that, which is probably candies.id

    • Assuming referential integrity we do not need to join to the table kids at all.

    • Since we need to inspect the whole table, it is faster to aggregate first and join to candies to get the name later.

    • You can run a window function over an aggregate function: