Search code examples
sqlpostgresqlcountaggregategreatest-n-per-group

Get rows with maximum count per one column - while grouping by two columns


I'm trying to get max count of a field. This is what I get and what I'm tried to do.

| col1 | col2 |
|  A   |  B   |
|  A   |  B   |
|  A   |  D   |
|  A   |  D   |
|  A   |  D   |
|  C   |  F   |
|  C   |  G   |
|  C   |  F   |

I'm trying to get the max count occurrences of col2, grouped by col1.

With this query I get the occurrences grouped by col1 and col2.

SELECT col1, col2, count(*) as conta 
FROM tab 
WHERE 
GROUP by col1, col2 
ORDER BY col1, col2

And I get:

| col1 | col2 | conta |
|  A   |  B   |   2   |
|  A   |  D   |   3   |
|  C   |  F   |   2   |
|  C   |  G   |   1   |

Then I used this query to get max of count:

SELECT max(conta) as conta2, col1 
FROM (
    SELECT col1, col2, count(*) as conta 
    FROM tab 
    WHERE 
    GROUP BY col1, col2 
    ORDER BY col1, col2
) AS derivedTable 
GROUP BY col1

And I get:

| col1 | conta |
|  A   |   3   |
|  C   |   2   |

What I'm missing is the value of col2. I would like something like this:

| col1 | col2 | conta |
|  A   |  D   |   3   |
|  C   |  F   |   2   |

The problem is that if I try to select the col2 field, I get an error message, that I have to use this field in group by or aggregation function, but using it in the group by it's not the right way.


Solution

  • Simpler & faster (and correct):

    SELECT DISTINCT ON (col1)
           col1, col2, count(*) AS conta
    FROM   tab 
    GROUP  BY col1, col2 
    ORDER  BY col1, conta DESC;
    

    db<>fiddle here (based on a_horse's fiddle)

    DISTINCT ON is applied after aggregation, so we don't need a subquery or CTE. Consider the sequence of events in a SELECT query: