Search code examples
unicodeencodinggoogle-bigquerybase64byte

How to determine if base64 and unicode character are equivalent?


I'm trying to figure out if the two encodings of the character are the same:

SELECT 
    ~b"1", 
    code_points_to_string([0x380]), 
    "\u0380"

The first is base64-encoded and the second is a string. How can I determine if these are equivalent? Is there some sort of function where I can, for example extract the code point from a base64-string in BigQuery?

enter image description here


Solution

  • You have mentioned ~b"1" as base64-encoded but according to this google cloud documentation ~ X is a Bitwise not operator. Here you are converting the string value to bytes and then performing Bitwise Not operation on it (Though while showing in UI BQ shows in base64 encoded). To convert it into base64-encoded you can use the TO_BASE64 function.

    You can use the to_code_points function to extract the code points from a base 64 encoded string.

    select *
    ,from_base64(d) decoded_val_bytes
    ,SAFE_CONVERT_BYTES_TO_STRING(from_base64(d)) sring_value
    ,to_code_points(from_base64(d)) cp
    from (
     select
     ~b"foo" a,
     code_points_to_string([102,111,111]) b
     ,to_base64(b"foo") d
     ,to_code_points("foo") e
    )
    

    to_code_points function is to convert string or bytes into an array of INT64 and code_points_to_string function is to convert an array of INT64 into STRING.