I am new to BigQuery and have a problem that I think is broadly applicable to a lot of different group by / data manipulation problems. I run the following query which groups by team, game and by zone, and returns the count for each:
SELECT
rebounds.team_id,
rebounds.game_id,
CASE
WHEN distance < 4 THEN 'zone1'
WHEN NOT some_bool THEN 'zone2'
WHEN distance >= 4 and distance2 < 12 THEN 'zone3'
WHEN some_bool AND distance >= 4 THEN 'zone4'
END AS this_zone,
COUNT(*) AS my_count
FROM t1
GROUP BY 1,2,3
And I get the following table (showing for a single game_id / team_id combination for simplicity to emphasize the question):
team_id game_id this_zone my_count
12 15 zone1 5
12 15 zone2 3
In the example above, there were no observations that met the criteria for zone3, and as a result the table only returned two rows for this game_id + team_id combination. However, I would like to have a 3rd row, so that the table looks like this:
team_id game_id this_zone my_count
12 15 zone1 5
12 15 zone2 3
12 15 zone3 2
12 15 zone4 0
Is this possible, to specify one of the groupby variables (in this case this_zone
), and have BigQuery create a row with count 0 if the count is zero?
I've updated the CASE WHEN to more accurately reflect the complexity of how the zones are created. In short, they rely on two different numerical values (distance and distance2 in this example), as well as a boolean value (some_bool).
Below for BigQuery Standard SQL
#standardSQL
SELECT team_id, game_id, this_zone, IFNULL(my_count, 0) AS my_count
FROM (
SELECT DISTINCT
rebounds.team_id,
rebounds.game_id,
this_zone
FROM t1, UNNEST(['zone1', 'zone2', 'zone3', 'zone4']) this_zone
) A
LEFT JOIN (
SELECT
rebounds.team_id,
rebounds.game_id,
CASE
WHEN distance < 4 THEN 'zone1'
WHEN NOT some_bool THEN 'zone2'
WHEN distance >= 4 AND distance2 < 12 THEN 'zone3'
WHEN some_bool AND distance >= 4 THEN 'zone4'
END AS this_zone,
COUNT(*) AS my_count
FROM t1
GROUP BY 1,2,3
) B
USING (team_id, game_id, this_zone)
Above is generic enough and has no dependency on how complex or not your logic - you just generate all expected rows (sub-query A) and left join it on your original query - that's all!