Let's say I have the following sample input table:
| ID | Code |
|------|------|
| 1234 | 121A |
| 1234 | 334B |
| 1234 | 543A |
| 1112 | 121A |
| 1112 | 333B |
| 2234 | 121A |
| 2234 | 301B |
| 3314 | 121A |
I want to create a new column called "Count". The "Count" column gets a 1 value if and only if across the entire table the unique ID meets the following conditions:
Note that if the unique ID has 121A and one of the two codes listed above and another code (not of the two) then that's fine as well.
Otherwise the ID gets a "Count" value of 0.
Given the rules above, my desired output table is below:
ID | Code |
---|---|
1234 | 1 |
1112 | 1 |
2234 | 0 |
3314 | 0 |
I know how to do this in Python but am not as familiar with SQL, especially BigQuery SQL. How do I write this query using BigQuery SQL?
Looks like you would want conditional aggregation, something like:
SELECT
ID,
CASE WHEN COUNT(CASE WHEN Code = '121A' THEN 1 END) = 1
AND COUNT(CASE WHEN Code IN ('333B', '334B ') THEN 1 END) = 1
THEN 1
ELSE 0 END AS Code
FROM YourTable
GROUP BY
ID;
The above code requires that the group satisfies each of the conditions exactly once, not more. If that is not your intention then please be clear.