I have data in a Snowflake table column that looks like this...
These are actually Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f
That is the complete value of one column
The UTF-8 decoded value of those characters above is なぜ日銀だけ違うのか?
Can someone suggest an update SQL statement that would change the value of the column to
These are actually Japanese characters なぜ日銀だけ違うのか?
Better still, is there an update statement that will decode any UTF-8 bytes to convert them into the desired characters
A quick JS UDF solves the problem:
create or replace function decode_utf8(S string)
returns string
language javascript
as
$$
return decodeURIComponent(escape(S));
$$;
In action:
select decode_utf8('These are actually Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f ')
Screenshot as proof:
(based on https://stackoverflow.com/a/13691499/132438)