We are loading data into a fact table, we the original temporary table on Snowflake looks like the following:
Where indicator_nbr
fields are questions asked within a survey.
We are using data modelling techniques in building our warehouse database, so the data will be added into a fact table like so:
Then the same for the indicator 2 and 3 and so on if there is other questions.
Each Field with its value will be as a single row. Of course there is other metadata to be added like load_dt
and record_src
but they are not a problem.
The current script is doing the following:
Get the fields into an array => fields_array = ['indicator_1', 'indicator_2', 'indicator_3']
A loop will run over the array and start adding each field with its value for each row. So imagine we are having 100 rows, we will run 300 inserts, one at a time:
for (var col_num = 0; col_num<fields_array.length; col_num = col_num+1) {
var COL_NAME = fields_array[col_num];
var field_value_query = "INSERT INTO SAT_FIELD_VALUE SELECT md5(id), CURRENT_TIMESTAMP(), NULL, 'SRC', "+COL_NAME+", md5(foreign_key_field) FROM "+TEMP_TABLE_NAME+"";
}
As mentioned in the comment on this post showing the full script, it is better to loop over a string concatenating each from values
of the insert query.
There is 2 issues of the suggested solution:
from values
, we should do a select
query as well from the temp table to get the value of the column, so there will be no optimization, or we will reduce the time a little bit but not to much.EDIT: A possible solution
I was thinking of doing an sql query selecting everything from the temp table, and do hashing and everything and save it into an array after transposing, but I have no idea how to do it.
Not sure if this is what you're looking for but it seems as though you just want to do a pivot:
Setup example scenario
create or replace transient table source_table
(
id number,
indicator_1 varchar,
indicator_2 number,
indicator_3 varchar
);
insert overwrite into source_table
values (1, 'Test', 2, 'DATA'),
(2, 'Prod', 3, 'DATA'),
(3, 'Test', 1, 'METADATA'),
(4, 'Test', 1, 'DATA')
;
create or replace transient table target_table
(
hash_key varchar,
md5 varchar
);
Run insert
select
name_col as hash_key,
md5(id)
from (select
id,
indicator_1,
indicator_2::varchar as indicator_2,
indicator_3
from source_table) unpivot ( val_col for name_col in (indicator_1, indicator_2, indicator_3))
;
This results in a target_table that looks like this:
+-----------+--------------------------------+
|HASH_KEY |MD5 |
+-----------+--------------------------------+
|INDICATOR_1|c4ca4238a0b923820dcc509a6f75849b|
|INDICATOR_2|c4ca4238a0b923820dcc509a6f75849b|
|INDICATOR_3|c4ca4238a0b923820dcc509a6f75849b|
|INDICATOR_1|c81e728d9d4c2f636f067f89cc14862c|
|INDICATOR_2|c81e728d9d4c2f636f067f89cc14862c|
|INDICATOR_3|c81e728d9d4c2f636f067f89cc14862c|
|INDICATOR_1|eccbc87e4b5ce2fe28308fd9f2a7baf3|
|INDICATOR_2|eccbc87e4b5ce2fe28308fd9f2a7baf3|
|INDICATOR_3|eccbc87e4b5ce2fe28308fd9f2a7baf3|
|INDICATOR_1|a87ff679a2f3e71d9181a67b7542122c|
|INDICATOR_2|a87ff679a2f3e71d9181a67b7542122c|
|INDICATOR_3|a87ff679a2f3e71d9181a67b7542122c|
+-----------+--------------------------------+