Search code examples
sqlunicodeamazon-redshift

Replace special Unicode characters in Redshift


I trying to remove the extra character 'u202C ' at the end of the line, but it nothing works. Where is the problem?

SELECT regexp_replace('race‬', '[\u202C]'), len(regexp_replace('race‬', '[\u202C]'))

And in general, how in Redshift to remove from the string everything that is not letters or numbers?


Solution

  • In these cases I like to remove all non-ascii characters but this may remove some characters that you want. The regexp string I use for this is '[^'||chr(1)||'-'||chr(127)||']' which matches anything that is NOT between ascii 1 and ascii 127.

    I have had some success with '[^[:print:][:cntrl:]]' but hasn't done the trick in all cases.

    You can also just add the character in question to the search string but this can get lost depending on what bench you have. It also may not be readable. Not always the best solution and can break when you change tools.