Search code examples
mysqlsqlcountgreatest-n-per-groupwindow-functions

How do I return the most common column value for each value in another column using mySQL?


I am looking for a tricky MySQL query to return the most common column value for each value in another column. I could use PHP to do it by each result, but it'd be so much cooler to do it with a single query :)

For example, if have a table like this, called "transactions":

Id  payee   exp     category    
1   Amazon  25.00   Gifts
2   Amazon  30.21   Books
3   Amazon  12.98   Gifts
4   Amazon  15.00   Groceries
5   Amazon  14.54   Gifts
6   Alibaba 55.55   Stock
7   Alibaba 99.00   Stock
8   Alibaba 12.00   Fun
…

The type of result I would want is like this:

payee   count(payee)    category
Amazon  3               Gifts
Alibaba 2               Stock
…

I can do this:

SELECT `payee`, COUNT(`payee`), `category` FROM `transactions` WHERE 1 GROUP BY category ORDER BY COUNT(`payee`) DESC 

and get close to what I want:

Amazon  3   Gifts
Alibaba 2   Stock
Amazon  1   Books
Amazon  1   Groceries
Alibaba 1   Fun

but I don't want the non-maximum counts (like Amazon,1,Books for example).

Do I have to do a subquery or something? Or use in?


Solution

  • You could filter the results of your existing query with a correlated subquery in a having clause, as follows:

    select payee, count(*), category 
    from transactions t
    group by payee, category 
    having count(*) = (
        select count(*) 
        from transactions t1 
        where t1.payee = t.payee 
        group by category
        order by count(*) desc limit 1
    )
    order by count(*) desc
    

    Demo on DB Fiddle:

    payee   | count(*) | category
    :------ | -------: | :-------
    Amazon  |        3 | Gifts   
    Alibaba |        2 | Stock   
    

    Alernatively, if you are running MySQL 8.0, you can rank the categories of each payee with window function rank() over(), and filter on the top record per group:

    select payee, cnt, category
    from (
        select 
            payee, 
            count(*) cnt, 
            category, 
            rank() over(partition by payee order by count(*) desc) rn
        from transactions 
        group by category, payee
    ) t
    where rn = 1
    

    Demo on DB Fiddle