My data is like this:
ID | Stage_1 | Stage 2 |
---|---|---|
1 | A | F |
1 | B | G |
1 | C | H |
2 | A | F |
2 | B | G |
2 | C | H |
3 | A | F |
3 | B | G |
4 | A | F |
4 | B | G |
I want to find the number of unique ID's for which Stage_1 = A
exists but Stage_2 = H
does not exist. Here, for ID = 3
and ID = 4
, A exists in Stage_1
but in Stage_2
there is no H
for ID = 3
or ID = 4
.
So the expected result here would be 2.
Outer join
won't apply as I'm getting data from only one table in our database using hive sql terminal.
You could try to left join the table to itself and count the nulls
SELECT COUNT(DISTINCT t1.ID)
FROM your_table t1
LEFT JOIN your_table t2
ON t2.Stage_1 = t1.Stage_1
AND t2.Stage_2 = 'H'
WHERE t1.Stage_2 = 'A'
AND t2.ID IS NULL