Search code examples
sqlpostgresqlsubquerynotin

I hit a weird issue running a query against a PostgreSQL Database using NOT IN and don't understand why it didn't work


I have a hierarchical table with an id and parent_id column, the parent_id has a foreign key to the id column. Each row can only have one parent, but multiple rows can be tied to the same parent. I wanted to retrieve all the rows that didn't have any children.

I attempted this with the following query

SELECT *
FROM table
WHERE id NOT IN (
    SELECT DISTINCT(parent_id)
    FROM table
)

This returned 0 rows. If i change the NOT IN to IN it correctly returns the rows that have children (other rows tie to it through their parent_id)

I ended up getting this working:

SELECT *
FROM table
WHERE id NOT IN(
    SELECT id
    FROM table
    WHERE id IN (
        SELECT DISTINCT(parent_id)
        FROM table
    )
)

But I don't understand why the first query didn't work? Can anybody help me to understand what's going on here? Do I not understand how NOT IN should work?


Solution

  • Try

    SELECT *
    FROM table
    WHERE id NOT IN (
        SELECT DISTINCT(parent_id)
        FROM table
        WHERE parent_id IS NOT NULL
    )
    

    For example:

    with t(x) as (values(1),(2))
    select 3
    where 3 not in (select x from t);
    
    ┌──────────┐
    │ ?column? │
    ├──────────┤
    │        3 │
    └──────────┘
    

    but

    with t(x) as (values(1),(2),(null))
    select 3
    where 3 not in (select x from t);
    
    ┌──────────┐
    │ ?column? │
    ├──────────┤
    └──────────┘
    

    It is because the DBMS can't made decision is id = null or not (the result is undefined)

    You could to fix it as mentioned above or using not exists:

    with t(x) as (values(1),(2),(null))
    select 3
    where not exists (select x from t where x = 3);
    
    ┌──────────┐
    │ ?column? │
    ├──────────┤
    │        3 │
    └──────────┘