Search code examples
sqlpostgresqlsql-order-bygreatest-n-per-groupsql-null

Selectively filter rows where column value is not null PostgreSQL


I have following table in Postgres 11.0.

col1        col2        col3                                                       col4             col5
1           NCT02582996 acetaminophen+caffeine+dihydroergotamine+metoclopramide.    metoclopramide. 204756
1           NCT02582996 acetaminophen+caffeine+dihydroergotamine+metoclopramide.    metoclopramide. null
3           NCT02582997 acetaminophen                                               metoclopramide. 204758
4           NCT02582998 ibuprufin                                                   ibuprufin       null

For same col1 value, I would like to keep only those rows where col5 is not null. and discard the other row.

Desired output is:

col1        col2        col3                                                       col4             col5
1           NCT02582996 acetaminophen+caffeine+dihydroergotamine+metoclopramide.    metoclopramide. 204756
3           NCT02582997 acetaminophen                                               metoclopramide. 204758
4           NCT02582998 ibuprufin                                                   ibuprufin       null
select * from tbl where col5 is not null  --excluding row 4

I am not sure how can I achieve this. Any help is highly appreciated.


Solution

  • I think you can use use distinct on:

    select distinct on (col1) t.*
    from tbl t
    order by col1, col5
    

    This generates one row per distinct value in col1 only. If col1 has both a non-null and null value in col5, then the non-null value is preserved. If there is only one row available, with a null value, then it is selected.