Search code examples
sqlhive

How do I find rows that don't exist in a table using hive sql query?


My data is like this:

ID Stage_1 Stage 2
1 A F
1 B G
1 C H
2 A F
2 B G
2 C H
3 A F
3 B G
4 A F
4 B G

I want to find the number of unique ID's for which Stage_1 = A exists but Stage_2 = H does not exist. Here, for ID = 3 and ID = 4, A exists in Stage_1 but in Stage_2 there is no H for ID = 3 or ID = 4.

So the expected result here would be 2.

Outer join won't apply as I'm getting data from only one table in our database using hive sql terminal.


Solution

  • You could try to left join the table to itself and count the nulls

    SELECT COUNT(DISTINCT t1.ID)
    FROM your_table t1
    LEFT JOIN your_table t2
      ON t2.Stage_1 = t1.Stage_1
        AND t2.Stage_2 = 'H'
    WHERE t1.Stage_2 = 'A'
    AND t2.ID IS NULL