Very new to BQ so please be patient.
I have a table with various columns (in the example they are col1 to col4 for brevity) and I have a regex that determines which column names are to be grouped together (example - ac_v\d+_final_p\w+). What I want to do is identify how many times non-null values appear within the grouping of columns for a specific row. From research, I was able to craft the attached query, however, this obviously only returns the count across the entire table and not the rows associated with the defined regex as desired.
Simplified Data Structure:
key | col1 | col2 | col3 | col4 | lol1 |
---|---|---|---|---|---|
2 | 0.0025 | null | null | null | null |
3 | 0.0015 | null | 0.0005 | null | null |
1 | null | null | null | 0.000 | 0.3 |
Desired Outcome: I want to count only those columns with the col name prefix.
key | count_non_nulls | count_nulls |
---|---|---|
1 | 1 | 3 |
2 | 1 | 3 |
3 | 2 | 2 |
Is there a way to achieve this in BQ Standard SQL?
Thank you in anticipation of your help.
BEGIN
#standardSQL
CREATE TEMP TABLE `mytable` AS (
SELECT 1 AS key, null AS col1, null AS col2, null AS col3, 0.0001 AS col4 UNION ALL
SELECT 2, 0.0025, null, null, null UNION ALL
SELECT 3, 0.0015, null, 0.0005, null
)
;
SELECT
COUNTIF(value not in ('null', '')) AS count_non_nulls,
COUNTIF(value in ('null', '')) AS count_nulls,
COUNT(value) count_all
FROM `mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
;
END
Consider below approach
select key,
(
select as struct
countif(column_value != 'null') as count_non_nulls,
countif(column_value = 'null') as count_nulls
from unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv, ':')[offset(0)] as column_name, split(kv, ':')[offset(1)] as column_value)])
where column_name != 'key'
and starts_with(column_name, 'col')
).*
from `project.dataset.table` t
if applied to sample data in your question - output is
Note: if you need to use whatever regex you have - you can use it instead of below line
starts_with(column_name, 'col')