I'm new to BigQuery and SQL. I'm trying to return data from Google's BigQuery. I have a query that can get data from tables of form EITHER events_* OR events_intraday_*, but I'm at an early stage and would really like to pull all the data from both sets of tables with a single query. It seems like this should be trivial but nothing I've tried has worked. I can't find an example in any doc which explains how to combine UNNEST with multiple tables in Google SQL.
Here's my original working query:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 1:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_xxx.events_intraday_*`, `analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 1 error: Column name event_params is ambiguous at [17:15]
Variation 2:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_356435236.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param,
`analytics_356435236.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 2 error: Column name event_params is ambiguous at [19:15]
Variation 3:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
(`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param),
(`analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 3 error: Syntax error: Expected keyword JOIN but got "," at [16:48] SQL
Variation 4:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
(`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
JOIN
(`analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 4 error: Syntax error: Expected keyword JOIN but got "," at [16:48]
...and so on. I've tried a dozen other variations, but nothing has worked. Many thanks for any help!
Just to see what would happen, I asked ChatGPT, which gave me the following code that seems to work perfectly:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM (
SELECT * FROM `analytics_xxx.events_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
UNION ALL
SELECT * FROM `analytics_xxx.events_intraday_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
), UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
I had previously tried "UNION ALL" but the key change seems to be splitting the "WHERE" into a section for selecting the table by date suffix, and a selection for selecting the record parameters.