Search code examples
sqlpostgresqlgroup-by

select distinct values and exclude rows by condition


There is a table with columns category not null, sub_category default null.

  • category is always set
  • sub_category is optional

I need to select distinct category,sub_category from table and

  • exclude records where sub_category is not null and null
  • keep record if there is only one combination of category + <null>
category    sub_category  
girl_toys        fluffy
girl_toys        dolls
drawing          <null> -- keep it since only one null sub_category
paining          red_color
paining          <null> -- exclude since there is non null sub_category

expected result:

category    sub_category  
girl_toys        fluffy
girl_toys        dolls
drawing          <null> -- keep is since drawing has only one null
paining          red_color -- <null> record excluded since there is non null sub_category exists 


Solution

  • One way of many:

    SELECT DISTINCT category, sub_category 
    FROM   tbl t1
    WHERE  sub_category IS NOT NULL
       OR  NOT EXISTS (
             SELECT FROM tbl t2
             WHERE  t2.category = t1.category
             AND    t2.sub_category IS NOT NULL
             );
    

    I assume you also want to keep one copy if there are multiple instances of the same category + null. You really only want to exclude category + null if there is another row for the same category with a not-null sub_category. At least that's what your "expected result" suggests.

    If the table is big and there are many duplicates per (category, sub_category), performance can be optimized with an emulated index-skip scan. See: