mysql sql count greatest-n-per-group window-functions

How do I return the most common column value for each value in another column using mySQL?

I am looking for a tricky MySQL query to return the most common column value for each value in another column. I could use PHP to do it by each result, but it'd be so much cooler to do it with a single query :)

For example, if have a table like this, called "transactions":

Id  payee   exp     category    
1   Amazon  25.00   Gifts
2   Amazon  30.21   Books
3   Amazon  12.98   Gifts
4   Amazon  15.00   Groceries
5   Amazon  14.54   Gifts
6   Alibaba 55.55   Stock
7   Alibaba 99.00   Stock
8   Alibaba 12.00   Fun
…

The type of result I would want is like this:

payee   count(payee)    category
Amazon  3               Gifts
Alibaba 2               Stock
…

I can do this:

SELECT `payee`, COUNT(`payee`), `category` FROM `transactions` WHERE 1 GROUP BY category ORDER BY COUNT(`payee`) DESC

and get close to what I want:

Amazon  3   Gifts
Alibaba 2   Stock
Amazon  1   Books
Amazon  1   Groceries
Alibaba 1   Fun

but I don't want the non-maximum counts (like Amazon,1,Books for example).

Do I have to do a subquery or something? Or use in?

Solution

You could filter the results of your existing query with a correlated subquery in a having clause, as follows:

select payee, count(*), category 
from transactions t
group by payee, category 
having count(*) = (
    select count(*) 
    from transactions t1 
    where t1.payee = t.payee 
    group by category
    order by count(*) desc limit 1
)
order by count(*) desc

Demo on DB Fiddle:

payee   | count(*) | category
:------ | -------: | :-------
Amazon  |        3 | Gifts   
Alibaba |        2 | Stock

Alernatively, if you are running MySQL 8.0, you can rank the categories of each payee with window function rank() over(), and filter on the top record per group:

select payee, cnt, category
from (
    select 
        payee, 
        count(*) cnt, 
        category, 
        rank() over(partition by payee order by count(*) desc) rn
    from transactions 
    group by category, payee
) t
where rn = 1

Demo on DB Fiddle