Search code examples
snowflake-cloud-data-platformvariant

Nested variant updating and deleting in snowflake


Currently streaming Change Data Capture events from MongoDB into snowflake, would like to apply them to the raw data that is already there.

Let's say I have a table like this:

+---------------------+-----------------+-----------+
|         key         |      value      | document  |
+---------------------+-----------------+-----------+
| foo.bar             | "changed value" | <variant> |
| foo.stuff.anArray.1 | 1000            | <variant> |
| ...                 | ...             | ...       |
+---------------------+-----------------+-----------+

Where variant contains a very heavily nested JSON ex:

{
    "foo": {
        "bar": "Some info",
        "baz": "Other info",
        "stuff": {
            "anArray": [1, 2, 3],
            "things": "More nested info"
        }
    }
}

I would like to use OBJECT_DELETE and OBJECT_INSERT functions to update this nested variant data in snowflake.

Tried making a js UDF but eval() is not supported.

Other approaches like writing a UDF that does key.split(".") and then recursively walking the structure and updating the field seem to take a long time and fail with JavaScript out of memory error: UDF thread memory limit exceeded in some cases.

Looking for a bit more efficient way to solve this issue.


Solution

  • I've faced a similar problem and used a generic UDF to solve it. Here is an example of a UDF implementation that will solve what you need:

    create or replace function edit_nested_entity("variant_object" variant, "path" string, "value" string)
    returns variant
    language javascript
    as
    $$
    // https://stackoverflow.com/questions/6491463/accessing-nested-javascript-objects-and-arrays-by-string-path?page=1&tab=votes#tab-top
        Object.byString = function(o, s) {
            s = s.replace(/\[(\w+)\]/g, '.$1'); // convert indexes to properties
            s = s.replace(/^\./, '');           // strip a leading dot
            var a = s.split('.');
            for (var i = 0, n = a.length; i < n; ++i) {
                var k = a[i];
                if (k in o) {
                    o = o[k];
                } else {
                    return;
                }
            }
            return o;
       }
       // get the entity base
       nested_entity = Object.byString(variant_object, path)
       // update the value
       nested_entity = value
       return variant_object;
    $$;
    
    

    And now you'll need to run the following SQL command to achieve what you need:

    UPDATE t1
    SET document = edit_nested_entity(document, key, value) 
    

    You'll maybe do some fine-tuning for this UDF to be more generic (or to use different UDF for different data types), but this will work.