Search code examples
pythonjsonsnowflake-cloud-data-platformsnowflake-connector

Passing variant data to Snowflake


I have written a stored procedure in Snowflake. I am calling it from a python lambda function and I have already verified that I am passing valid json. I can't figure out how to use the data I passed to the stored procedure in my insert.

The part with "values(${variant_data}" in the insert is what is giving me trouble.

I read this how to pass variant data into snowflake table using snowflake stored procedure, which said that what I needed to do isn't completely supported in Snowflake. Apparently, I would need to convert the "variant_data" to string and then back to variant to use the data in my insert but the example wasn't clear to me.

This is my stored procedure.

create or replace procedure  MY_STORED_PROCEDURE("variant_data" variant)
    returns string
    language javascript
    strict
    execute as owner
    as
    $$
    var variant_data=variant_data;
    var sql_command = `
    insert into MY_TABLE(MY_VARIANT_DATA) values(${variant_data})
    `
    $$;

If someone could explain how to use variant data in my insert, I'd appreciate it.


Solution

  • Well first steps lets check our base SQL is in a acceptable form:

    create table MY_TABLE(MY_VARIANT_DATA variant);
    
    insert into my_table(my_variant_data) values 
        (parse_json('{"key1":"value1","key2":1234}'));
    

    Invalid expression [PARSE_JSON('{"key1":"value1","key2":1234}')] in VALUES clause

    ok, so lets make that a SELECT and PARSE_JSON:

    insert into my_table(my_variant_data) select 
        (parse_json('{"key1":"value1","key2":1234}'));
    
    number of rows inserted
    1

    ok so that works for a single value (of which is really not how you should load data into snowflake, I mean if you have 1 piece of data per hour sure, its' fine but if you doing this often, your tables will be very fragmented)..

    so lets try casting to ::VARIANT

    insert into my_table(my_variant_data) values 
       ('{"key1":"value1","key2":1234}'::variant);
    

    Invalid expression [CAST('{"key1":"value1","key2":1234}' AS VARIANT)] in VALUES clause

    ok, so a multi-insert via a SELEC FROM VALUES:

    insert into my_table(my_variant_data) 
        select parse_json(column1) 
        from values
            ('{"key1":"value1","key2":1234}'),
            ('{"key1":"value2","key2":1235}');
    
    number of rows inserted
    2

    Ok, so we now have a couple of patterns of valid INSERT sql..

    lets recreate the procedure using that, this will look like "cheating" as it's a full working code, but it took me way to long to get things working. Basically I didn't read Felipe's answer too deeply, anyways, in the end I settle for string concatenation, which is a security risk, so I really wouldn't do it this way:

    create or replace procedure MY_STORED_PROCEDURE("variant_as_string" text)
        returns string
        language javascript
        strict
        execute as owner
        as
        $$
        var sql_command = "insert into my_table(my_variant_data) select (parse_json('"+ variant_as_string +"'));";
        var sql = snowflake.createStatement( {sqlText: sql_command});
        var resultSet = sql.execute();
        return sql_command;
        $$;
    

    which works! although should maybe not be used..

    -- lets the prior inserts
    truncate table MY_TABLE;
    
    -- do new  insert
    call MY_STORED_PROCEDURE('{"key1":"value4","key2":1238}');
    
    -- look at the results
    select * from my_table;
    
    MY_VARIANT_DATA
    { "key1": "value4", "key2": 1238 }

    but that also changed the input type from variant to test...

    by the way how I debugged this stuff was comment bit out like so:

    create or replace procedure MY_STORED_PROCEDURE("variant_data" variant)
        returns string
        language javascript
        strict
        execute as owner
        as
        $$
        var sql_command = "insert into my_table(my_variant_data) select (parse_json('"+ JSON.stringify(variant_data) +"'));";
        //var sql = snowflake.createStatement( {sqlText: sql_command});
        //var resultSet = sql.execute();
        return sql_command;
        $$;
    

    and read the SQL to see it made me happy, anyway the with those two line uncommented out, works for variant data:

    call MY_STORED_PROCEDURE(parse_json('{"key1":"value12","key2":12345}'));
    
    select * from my_table;
    
    MY_VARIANT_DATA
    { "key1": "value4", "key2": 1238 }
    { "key1": "value12", "key2": 12345 }

    So there you have it, two ways to inject a single row in a table. Both which possible should be avoid for SQL injection reasons, a to do that correctly, you should use Felipe's answer on that other question.

    create or replace procedure MY_STORED_PROCEDURE("variant_data" variant)
        returns string
        language javascript
        strict
        execute as owner
        as
        $$
        var sql_command = "insert into my_table(my_variant_data) select (parse_json(:1));";
        var sql = snowflake.createStatement( {sqlText: sql_command, binds:[JSON.stringify(variant_data)]});
        var resultSet = sql.execute();
        return sql_command;
        $$;
    
    call MY_STORED_PROCEDURE(parse_json('{"key1":"value123","key2":123456}'));
    

    which gives:

    MY_VARIANT_DATA
    { "key1": "value4", "key2": 1238 }
    { "key1": "value12", "key2": 12345 }
    { "key1": "value123", "key2": 123456 }