Search code examples
sql-serverregexstored-proceduresnon-ascii-characters

Replacing non-ascii characters in MSSQL


I have done quite a bit of research and can not seem to find any answers. What I need to do is the following:

REPLACE all characters in a MSSQL column which are non ascii characters with their ascii equivalents. For example:

ë --> e
ï --> i
ñ --> n

I have read through the following which may seem similar but are, instead of replacing, eliminating/removing these characters (which is not ideal).

How do I remove extended ASCII characters from a string in T-SQL?

Purpose of [^\x20-\x7E] in regular expressions

http://jamesveitch.com/t-sql-function-replace-non-printable-ascii-chars/

http://www.sqlservercentral.com/Forums/Topic853088-149-1.aspx

Thanks to @Eric below and the selected answer.

(Helpful reference: How to remove accents and all chars <> a..z in sql-server?)


Solution

  • You cannot do it any other way than the old-fashioned, "hard" way (in any language, even, not only SQL).

    Since in many (spoken/written) languages, accented characters are not the same as non-accented ones, it's actually just a visual similarity, so there is no true correspondance. Some letters can look like just the same with a symbol added, but actually have a completely different "sound" and different rules (example).

    Just build a table, or array, or whatever of the correspondances you know, or find, or can think of.

    char | replacement
    ------------------
    à    | a
    è    | e
    é    | e
    ñ    | n
    ç    | c
    ß    | B
    ...
    

    Then, loop on your data and replacement chars to replace all characters by their correspondance.