Search code examples
sqlgoogle-bigquery

How to SELECT from events_* AND events_intraday_* tables using SQL in Google BigQuery


I'm new to BigQuery and SQL. I'm trying to return data from Google's BigQuery. I have a query that can get data from tables of form EITHER events_* OR events_intraday_*, but I'm at an early stage and would really like to pull all the data from both sets of tables with a single query. It seems like this should be trivial but nothing I've tried has worked. I can't find an example in any doc which explains how to combine UNNEST with multiple tables in Google SQL.

Here's my original working query:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 1:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_xxx.events_intraday_*`, `analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 1 error: Column name event_params is ambiguous at [17:15]

Variation 2:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_356435236.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param,
  `analytics_356435236.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 2 error: Column name event_params is ambiguous at [19:15]

Variation 3:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  (`analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param),
  (`analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 3 error: Syntax error: Expected keyword JOIN but got "," at [16:48] SQL

Variation 4:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  (`analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
JOIN
  (`analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 4 error: Syntax error: Expected keyword JOIN but got "," at [16:48]

...and so on. I've tried a dozen other variations, but nothing has worked. Many thanks for any help!


Solution

  • Just to see what would happen, I asked ChatGPT, which gave me the following code that seems to work perfectly:

    SELECT
      user_id as user_id,
      TIMESTAMP_MICROS(event_timestamp) as timestamp,
      experiment_id_param.value.string_value AS experiment_id,
      variation_id_param.value.int_value AS variation_id,
      geo.country as country,
      traffic_source.source as source,
      traffic_source.medium as medium,
      device.category as device,
      device.web_info.browser as browser,
      device.operating_system as os
    FROM (
      SELECT * FROM `analytics_xxx.events_*`
      WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
      
      UNION ALL
      
      SELECT * FROM `analytics_xxx.events_intraday_*`
      WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
    ), UNNEST(event_params) AS experiment_id_param,
      UNNEST(event_params) AS variation_id_param
    WHERE
      event_name = 'experiment_viewed'  
      AND experiment_id_param.key = 'experiment_id'
      AND variation_id_param.key = 'variation_id'
      AND user_id is not null
    

    I had previously tried "UNION ALL" but the key change seems to be splitting the "WHERE" into a section for selecting the table by date suffix, and a selection for selecting the record parameters.