Search code examples
sqlutf-8snowflake-cloud-data-platformdecode

Is There a Snowflake Update SQL Statement to Decode UTF-8 Bytes?


I have data in a Snowflake table column that looks like this...

These are actually Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f 

That is the complete value of one column

The UTF-8 decoded value of those characters above is なぜ日銀だけ違うのか?

Can someone suggest an update SQL statement that would change the value of the column to

These are actually Japanese characters なぜ日銀だけ違うのか?

Better still, is there an update statement that will decode any UTF-8 bytes to convert them into the desired characters


Solution

  • A quick JS UDF solves the problem:

    create or replace function decode_utf8(S string)
    returns string
    language javascript
    as
    $$
    return decodeURIComponent(escape(S));
    $$;
    

    In action:

    select decode_utf8('These are actually Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f ')
    

    Screenshot as proof:

    enter image description here

    (based on https://stackoverflow.com/a/13691499/132438)