Search code examples
sqlgroup-bygoogle-bigquerycountconditional-statements

BigQuery SQL: Conditionally Count Column Values (if exactly 2/3 codes exist from list) While Grouping By Unique ID


Let's say I have the following sample input table:

| ID   | Code |
|------|------|
| 1234 | 121A |
| 1234 | 334B |
| 1234 | 543A |
| 1112 | 121A |
| 1112 | 333B |
| 2234 | 121A |
| 2234 | 301B |
| 3314 | 121A |

I want to create a new column called "Count". The "Count" column gets a 1 value if and only if across the entire table the unique ID meets the following conditions:

  • must have 121A in the "Code" column
  • can only have 1 of the following two codes: 333B, 334B in addition to 121A

Note that if the unique ID has 121A and one of the two codes listed above and another code (not of the two) then that's fine as well.

Otherwise the ID gets a "Count" value of 0.

Given the rules above, my desired output table is below:

ID Code
1234 1
1112 1
2234 0
3314 0

I know how to do this in Python but am not as familiar with SQL, especially BigQuery SQL. How do I write this query using BigQuery SQL?


Solution

  • Looks like you would want conditional aggregation, something like:

    SELECT
      ID,
      CASE WHEN COUNT(CASE WHEN Code = '121A' THEN 1 END) = 1
            AND COUNT(CASE WHEN Code IN ('333B', '334B ') THEN 1 END) = 1
        THEN 1
        ELSE 0 END AS Code
    FROM YourTable
    GROUP BY
      ID;
    

    The above code requires that the group satisfies each of the conditions exactly once, not more. If that is not your intention then please be clear.