Search code examples
sqlnestedgoogle-bigquery

Transform a column of type string to an array/record i.e. nesting a column


I am trying to get calculate and retrieve some indicators from mutiple tables I have in my dataset on bigquery. I am want to invoke nesting on sfam which is a column of strings which I can't do for now i.e. it could have values or be null. So the goal is to transform that column into an array/record...that's the idea that came to mind and I have no idea how to go about doing it.

The product and cart are grouped by key_web, dat_log, univ, suniv, fam and sfam.

The data is broken down into universe refered to as univ which is composed of sub-universe refered to as suniv. Sub-universes contain families refered to as 'fam' which may or may not have sub-families refered to as sfam. I want to invoke nesting on prd.sfam to reduce the resulting columns.

The data is collected from Google Analytics for insight into website trafic and users activities.

I am trying to get information and indicators about each visitor, the amount of time he/she spent on particular pages, actions taken and so on. The resulting table gives me the sum of time spent on those pages, sum of total number of visits for a single day and a breakdown to which category it belongs, thus the univ, suniv, fam and sfam columns which are of type string (the sfam could be null since some sub-universes suniv only have families famand don't go down to a sub-family level sfam.

  • dat_log: refers to the date

  • nrb_fp: number of views for a product page

  • tps_fp: total time spent on said page

I tried different methods that I found online but none worked, so I post my code and problem in hope of finding guidance and a solution !

A simpler query would be:

 select
        prd.key_web
        , dat_log
        , prd.nrb_fp
        , prd.tps_fp
        , prd.univ
        , prd.suniv
        , prd.fam
        , prd.sfam
    from product as prd
    left join cart as cart
        on prd.key_web = cart.key_web
        and prd.dat_log = cart.dat_log
        and prd.univ = cart.univ
        and prd.suniv = cart.suniv
        and prd.fam = cart.fam
        and prd.sfam = cart.sfam

And this is a sample result of the query for the last 6 columns in text and images:

Again, I want to get a column of array as sfam where I have all the string values of sfam even nulls.

I limited the output to only only the last 6 columns, the first 3 are the row, key_web and dat_log. Each fam is composed of several sfam or none (null), I want to be able to do nesting on either the fam or sfam.


Solution

  • Obviously, in BigQuery you cannot output array which holds NULL, but if for some reason you need to preserve them somehow - the workaround is to create array of structs as opposed to arrays of single elements

    For example (BigQuery Standard SQL) if you try to execute below

    SELECT ['a', 'b', NULL] arr1, ['x', NULL, NULL] arr2    
    

    you will get error: Array cannot have a null element; error in writing field arr1

    While if you will try below

    SELECT ARRAY_AGG(STRUCT(val1, val2)) arr
    FROM UNNEST(['a', 'b', NULL]) val1 WITH OFFSET
    JOIN UNNEST(['x', NULL, NULL]) val2 WITH OFFSET
    USING(OFFSET)
    

    you get result

    Row     arr.val1    arr.val2     
    1       a           x    
            b           null     
            null        null       
    

    As you can see - approaching this way - you can have have even both elements as NULL