Search code examples
sqlgoogle-bigqueryunpivot

Generate column list with select statement for unpivot


I have a very wide table that I'm trying to unpivot without literally listing the hundreds of columns.

I thought something like this should work

WITH column_names_table as (
  SELECT column_name
  FROM `my_project.dataset.INFORMATION_SCHEMA.COLUMNS`
  WHERE 
    table_name = 'my_table'
    AND NOT column_name = 'time_utc'
)

SELECT time_utc, measurement, identifier 
FROM `my_project.dataset.my_table`
    UNPIVOT(
        measurement FOR identifier IN 
        (SELECT column_name from column_names_table)
    )

but I get a Syntax error: Unexpected keyword SELECT. If i just replace the SELECT statement with a literal list of the columns this works. I seems like i'm so close here...

Can I make this work? If not, is there another way to unpivot all columns except a couple? I'm also wondering if there's a good reason this isn't possible..

Input data:

time_utc id1 id2 ... idN
2019-01-24 05:00:00 UTC 0.5 1.2 12
2019-01-24 06:00:00 UTC 0.6 1.3 1.2

Output data:

time_utc measurement identifier
2019-01-24 05:00:00 UTC 0.5 id1
2019-01-24 06:00:00 UTC 0.6 id1
2019-01-24 05:00:00 UTC 1.2 id2
2019-01-24 06:00:00 UTC 1.3 id2
... ... ...

Solution

  • Since you can't specify columns to unpivot dynamically, you need to consider a dynamic sql if you use INFORMATION_SCHEMA.COLUMNS view.

    EXECUTE IMMEDIATE FORMAT("""
      SELECT time_utc, measurement, identifier 
        FROM `my_project.dataset.my_table` 
     UNPIVOT (measurement FOR identifier IN (%s))
    """, (
      SELECT STRING_AGG(column_name) 
        FROM `my_project.dataset.INFORMATION_SCHEMA.COLUMNS`
       WHERE table_name = 'my_table' AND NOT column_name = 'time_utc'
    ));
    
    Query results

    enter image description here

    And you might consider @Mikhail's another approach in below link if INFORMATION_SCHEMA or a dynamic sql is not available to you.

    below query is a slightly modified one to suit your case having same output above.

    select time_utc, measurement, 'id' || (offset + 1) identifier
    from `dataset.my_table` t,
    unnest(split(translate(format('%t', (select as struct * except(time_utc) from unnest([t]))), '() ', ''))) measurement with offset;
    
    Sample Dataset
    CREATE SCHEMA IF NOT EXISTS dataset;
    
    CREATE TABLE IF NOT EXISTS dataset.my_table AS 
    SELECT '2019-01-24 05:00:00 UTC' time_utc, 0.5 id1, 1.2 id2, 1.0 id3, 1.2 id4 UNION ALL
    SELECT '2019-01-24 06:00:00 UTC', 0.6, 1.3, 1.2, 1.4;