Search code examples
sqloptimizationsnowflake-cloud-data-platformdata-vault

Snowflake how can we loop over each row of a temp table and insert its values with into another table where each field with its value is a single row?


We are loading data into a fact table, we the original temporary table on Snowflake looks like the following:

enter image description here

Where indicator_nbr fields are questions asked within a survey.

We are using data modelling techniques in building our warehouse database, so the data will be added into a fact table like so:

enter image description here

Then the same for the indicator 2 and 3 and so on if there is other questions.

Each Field with its value will be as a single row. Of course there is other metadata to be added like load_dt and record_src but they are not a problem.

The current script is doing the following:

  1. Get the fields into an array => fields_array = ['indicator_1', 'indicator_2', 'indicator_3']

  2. A loop will run over the array and start adding each field with its value for each row. So imagine we are having 100 rows, we will run 300 inserts, one at a time:

    for (var col_num = 0; col_num<fields_array.length; col_num = col_num+1) {

        var COL_NAME = fields_array[col_num];
    
        var field_value_query = "INSERT INTO SAT_FIELD_VALUE SELECT md5(id), CURRENT_TIMESTAMP(), NULL, 'SRC', "+COL_NAME+", md5(foreign_key_field) FROM "+TEMP_TABLE_NAME+"";
    

    }

As mentioned in the comment on this post showing the full script, it is better to loop over a string concatenating each from values of the insert query.

There is 2 issues of the suggested solution:

  1. There is a size limit of a query on Snowflake (it should be less than 1 MB);
  2. if we are going to loop over each field and concatenate the from values, we should do a select query as well from the temp table to get the value of the column, so there will be no optimization, or we will reduce the time a little bit but not to much.

EDIT: A possible solution

I was thinking of doing an sql query selecting everything from the temp table, and do hashing and everything and save it into an array after transposing, but I have no idea how to do it.


Solution

  • Not sure if this is what you're looking for but it seems as though you just want to do a pivot:

    Setup example scenario

    create or replace transient table source_table
    (
        id          number,
        indicator_1 varchar,
        indicator_2 number,
        indicator_3 varchar
    );
    
    insert overwrite into source_table
    values (1, 'Test', 2, 'DATA'),
           (2, 'Prod', 3, 'DATA'),
           (3, 'Test', 1, 'METADATA'),
           (4, 'Test', 1, 'DATA')
    ;
    
    create or replace transient table target_table
    (
        hash_key varchar,
        md5      varchar
    );
    

    Run insert

    select
        name_col as hash_key,
        md5(id)
    from (select
              id,
              indicator_1,
              indicator_2::varchar as indicator_2,
              indicator_3
          from source_table) unpivot ( val_col for name_col in (indicator_1, indicator_2, indicator_3))
    ;
    

    This results in a target_table that looks like this:

    +-----------+--------------------------------+
    |HASH_KEY   |MD5                             |
    +-----------+--------------------------------+
    |INDICATOR_1|c4ca4238a0b923820dcc509a6f75849b|
    |INDICATOR_2|c4ca4238a0b923820dcc509a6f75849b|
    |INDICATOR_3|c4ca4238a0b923820dcc509a6f75849b|
    |INDICATOR_1|c81e728d9d4c2f636f067f89cc14862c|
    |INDICATOR_2|c81e728d9d4c2f636f067f89cc14862c|
    |INDICATOR_3|c81e728d9d4c2f636f067f89cc14862c|
    |INDICATOR_1|eccbc87e4b5ce2fe28308fd9f2a7baf3|
    |INDICATOR_2|eccbc87e4b5ce2fe28308fd9f2a7baf3|
    |INDICATOR_3|eccbc87e4b5ce2fe28308fd9f2a7baf3|
    |INDICATOR_1|a87ff679a2f3e71d9181a67b7542122c|
    |INDICATOR_2|a87ff679a2f3e71d9181a67b7542122c|
    |INDICATOR_3|a87ff679a2f3e71d9181a67b7542122c|
    +-----------+--------------------------------+