Currently streaming Change Data Capture events from MongoDB into snowflake, would like to apply them to the raw data that is already there.
Let's say I have a table like this:
+---------------------+-----------------+-----------+
| key | value | document |
+---------------------+-----------------+-----------+
| foo.bar | "changed value" | <variant> |
| foo.stuff.anArray.1 | 1000 | <variant> |
| ... | ... | ... |
+---------------------+-----------------+-----------+
Where variant contains a very heavily nested JSON ex:
{
"foo": {
"bar": "Some info",
"baz": "Other info",
"stuff": {
"anArray": [1, 2, 3],
"things": "More nested info"
}
}
}
I would like to use OBJECT_DELETE
and OBJECT_INSERT
functions to update this nested variant data in snowflake.
Tried making a js UDF but eval()
is not supported.
Other approaches like writing a UDF that does key.split(".")
and then recursively walking the structure and updating the field seem to take a long time and fail with JavaScript out of memory error: UDF thread memory limit exceeded
in some cases.
Looking for a bit more efficient way to solve this issue.
I've faced a similar problem and used a generic UDF to solve it. Here is an example of a UDF implementation that will solve what you need:
create or replace function edit_nested_entity("variant_object" variant, "path" string, "value" string)
returns variant
language javascript
as
$$
// https://stackoverflow.com/questions/6491463/accessing-nested-javascript-objects-and-arrays-by-string-path?page=1&tab=votes#tab-top
Object.byString = function(o, s) {
s = s.replace(/\[(\w+)\]/g, '.$1'); // convert indexes to properties
s = s.replace(/^\./, ''); // strip a leading dot
var a = s.split('.');
for (var i = 0, n = a.length; i < n; ++i) {
var k = a[i];
if (k in o) {
o = o[k];
} else {
return;
}
}
return o;
}
// get the entity base
nested_entity = Object.byString(variant_object, path)
// update the value
nested_entity = value
return variant_object;
$$;
And now you'll need to run the following SQL command to achieve what you need:
UPDATE t1
SET document = edit_nested_entity(document, key, value)
You'll maybe do some fine-tuning for this UDF to be more generic (or to use different UDF for different data types), but this will work.