Search code examples
sqlgoogle-bigqueryleft-joindistinctcross-join

Insert missing values in column at all levels of another column in SQL?


I have been working with some big data in SQL/BigQuery and found that it has some holes in it that need to be filled with values in order to complete the dataset. What I'm struggling with is how to insert the missing values properly.

Say that I have multiple levels of a variable (1, 2, 3... no upper bound) and for each of these levels, they should have an A, B, C value. Some of these records will have data, others will not.

Current dataset:

level value data
1     A     1a_data
1     B     1b_data
1     C     1c_data
2     A     2a_data
2     C     2c_data
3     B     3b_data

What I want the dataset to look like:

level value data
1     A     1a_data
1     B     1b_data
1     C     1c_data
2     A     2a_data
2     B     NULL
2     C     2c_data
3     A     NULL
3     B     3b_data
3     C     NULL

What would be the best way to do this?


Solution

  • You need a CROSS join of the distinct levels with the distinct values and a LEFT join to the table:

    SELECT l.level, v.value, t.data
    FROM (SELECT DISTINCT level FROM tablename) l
    CROSS JOIN (SELECT DISTINCT value FROM tablename) v
    LEFT JOIN tablename t ON t.level = l.level AND t.value = v.value
    ORDER BY l.level, v.value;
    

    See the demo.