I have a BigQuery table that contains a column which has a list of 'categories' (delimited by " / "):
{
"ID": "452",
"Location": "France",
"Calories": "400",
"Categories": "/ brown / nutty / salty /"
},
{
"ID": "288",
"Location": "UK",
"Calories": "800",
"Categories": "/ brown / roasted / nutty /"
},
I connect to my table using the following SQL query within Google Data Studio:
#standardSQL
SELECT
ID,
Location,
Calories,
Categories
FROM table_name;
I am trying to group rows in the table by 'Categories'.
I should then be able to filter the rows by a category and the metrics can be averaged. In the example above, if I filtered by 'Nutty', the average 'Calories' would be '600'.
The CASE expression I have tried only allows me to match the last 'Category' within the Category column, rather than matching to all rows that include the string:
CASE
WHEN REGEXP_MATCH(Categories, '.*nutty.*') THEN 'Nutty'
WHEN REGEXP_MATCH(Categories, '.*brown.*') THEN 'Brown'
WHEN REGEXP_MATCH(Categories, '.*salty.*') THEN 'Salty'
END
I believe this is an aggregation issue, but is there any way to allow each row to be matched within a 'Category' dimension multiple times in this way?
Any help much appreciated, thank you.
Using CASE
in your use-case is not practical - instead you should use REGEXP_EXTRACT_ALL()
or SPLIT()
along with UNNEST()
as below example shows (BigQuery Standard SQL)
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT "452" ID,"France" Location,400 Calories,"/ brown / nutty / salty /" Categories
UNION ALL
SELECT "288","UK",800,"/ brown / roasted / nutty /"
)
SELECT ID, Location, Calories, Category
FROM `project.dataset.your_table`,
UNNEST(REGEXP_EXTRACT_ALL(Categories, r' (\w+) ')) Category
this will give you below result
Row ID Location Calories Category
1 452 France 400 brown
2 452 France 400 nutty
3 452 France 400 salty
4 288 UK 800 brown
5 288 UK 800 roasted
6 288 UK 800 nutty
Now, you can GROUP BY
whatever logic you want and all Categories will be accounted appropriatelly