I trying out the NORMALIZE function with NFKC in bigquery from the documentation, I see that I can convert a string to a readable format. For example
WITH EquivalentNames AS (
SELECT name
FROM UNNEST([
'Jane\u2004Doe',
'\u0026 Hello'
]) AS name
)
SELECT
NORMALIZE(name, NFKC) AS normalized_str
FROM EquivalentNames
GROUP BY 1;
The ampersand character shows up correctly, but I have a table, with a column of STRING with unicode character in its values, but I'm not able to use NORMALIZE to convert it to a readable format.
I've also tried some of the other solutions presented Decode Unicode's to Local language in Bigquery but nothing is working yet.
You posted a question about NORMALIZE
, but didn't make your goals clear.
Here I'll answer the question about NORMALIZE
- to point out that it probably doesn't do what you are expecting it to do. But at least it's acting as expected.
There are many ways to encode the same string with Unicode. Normalize chooses one, while preserving the string.
See this query:
SELECT *, a=b ab, a=c ac, a=d ad, b=c bc, b=d bd, c=d cd
FROM (
SELECT NORMALIZE('hello ñá 😞', NFC) a
, NORMALIZE('hello ñá 😞', NFKC) b
, NORMALIZE('hello ñá 😞', NFD) c
, NORMALIZE('hello ñá 😞', NFKD) d
)
As you see - every time you get the same string, they just have different non-visible representations.