Search code examples
pythonsqlsnowflake-cloud-data-platformetl

Transpose/Pivot data from one column into multiple columns per row in Snowflake table


I have a table with customer data, and variables for the number of values per row, the column prefix, and the filler value.

I'm trying to write a dynamic statement/query to transpose the customer data based on the ID column into a new table with the specified number of columns per row, named with the column prefix and sequential numbers. If any column doesn't have data, it should use the filler value.

Example:

Input table:

Cust_ID Cust_Values
1 CI10
1 CI11
1 CI12
1 CI13
1 CI14
2 CI20

Variables:

values_per_row = 2
col_prefix = 'CV_'
filler_value = 'XX'

Expected output:

When values_per_row is 2:

ID CV_1 CV_2
1 CI10 CI11
1 CI12 CI13
1 CI14 XX
2 CI20 XX

When values_per_row is 3:

ID CV_1 CV_2 CV_3
1 CI10 CI11 CI12
1 CI13 CI14 XX
2 CI20 XX XX

The final output should have the column names, based on the col_prefix and values_per_row variables. The solution should be flexible and reusable so that it can be used to transpose data based on the variables.

I've been trying hard to generate this efficiently in snowflake for some time now. Any help/insight is greatly appreciated.


Solution

  • The following SQL gives the desired output for 3 columns (you can modify the code for other options for number of columns):

    with t0(cust_id, cust_value) as (
        select * from values
            (1, 'CI10'),
            (1, 'CI11'),
            (1, 'CI12'),
            (1, 'CI13'),
            (1, 'CI14'),
            (2, 'CI20')
    ), t1 as (
        select 
            *,
            row_number() over (partition by cust_id order by cust_value)-1 as rn
        from t0
    )
    select
        cust_id,
        max(case when rn %3 = 0 then cust_value end) as cv_1,
        ifnull(max(case when rn %3 = 1 then cust_value end),'XX') as cv_2,
        ifnull(max(case when rn %3 = 2 then cust_value end),'XX') as cv_3
    from t1
    group by cust_id, floor(rn/3)
    order by cust_id, floor(rn/3)
    ;
    
    CUST_ID CV_1 CV_2 CV_3
    1 CI1O CI11 CI12
    1 CI13 CI14 XX
    2 CI20 XX XX

    For a dynamic SQL approach where you specify the variables, generate the SQL dynamically, and return the results you can use the following:

    declare
      values_per_row int := 3;
      col_prefix text := 'CV_';
      filler_value text := 'XX';
      sql text;
      res resultset;
    begin
      sql := '
          with t0 as (
              select
                *,
                row_number() over (partition by cust_id order by cust_values)-1 as rn
                from cust
        )
        select
          cust_id
      ';
      
      for i in 1 to values_per_row do
        sql := sql || ', ifnull(max(case when rn % ' || values_per_row || ' = ' || i || '-1 then cust_values end), ''' || filler_value || ''') as ' || col_prefix || i;
      end for;
      sql := sql || ' from t0 group by cust_id, floor(rn/' || values_per_row || ') order by cust_id, floor(rn/' || values_per_row || ');';
      
      res := (execute immediate :sql);
      return table (res);
    end;