Search code examples
sqlpostgresqlgreatest-n-per-groupaggregation

Group by columns and fetch rows with maximum length of string in another column


I have following table in Postgres 11.

table1:

id          col1    col2                        col3                        col4                        
NCT00000374 Drug    olanzapine                  olanzapine                  olanzapine                  
NCT00000390 Drug    imipramine hydrochloride    imipramine hydrochloride    imipramine hydrochloride    
NCT00000390 Drug    imipramine hydrochloride    imipramine hydrochloride    imipramine                  
NCT00000412 Drug    placebo calcitriol          placebo calcitriol          calcitriol                  

I would like to fetch rows with maximum length values per (id, col1, col2, col3).

The desired output is:

id          col1    col2                        col3                        col4                        
NCT00000374 Drug    olanzapine                  olanzapine                  olanzapine                  
NCT00000390 Drug    imipramine hydrochloride    imipramine hydrochloride    imipramine hydrochloride    
NCT00000412 Drug    placebo calcitriol          placebo calcitriol          calcitriol                  

I tried the following query with no success so far:

select * from table1
where length(col4) = max(length(col4))
group by id, col1, col2, col3
order by id

Solution

  • A case for DISTINCT ON:

    SELECT DISTINCT ON (id, col1, col2, col3)
           *
    FROM   table1
    ORDER  BY id, col1, col2, col3, length(col4) DESC NULLS LAST;
    

    Simplest and for few rows per (id, col1, col2, col3) also typically fastest. Detailed explanation:

    For big tables and many rows per group, there are (much) faster techniques: