Search code examples
sql-serverunicodeinternationalizationsql-server-2008-r2globalization

Normalize unicode string in SQL Server?


Is there a function in SQL Server to normalize a unicode string? e.g.

UPDATE Orders SET Notes = NormalizeString(Notes, 'FormC')

Unicode Normalization Forms:

  • C​omposition (C): A + ¨ becomes Ä
  • D​ecomposition (D): Ä becomes A + ¨
  • Compatible Composition (KC): A + ¨ + + n becomes Ä + f + i + n
  • Compatible Decomposition (KD): Ä + + n becomes A + ¨ + f + i + n

i cannot find any built-in function, so i assume there is none.


Ideally, if there can be only one, then i happen to need Form C today:

Unicode normalization form C, canonical composition. Transforms each decomposed grouping, consisting of a base character plus combining characters, to the canonical precomposed equivalent. For example, A + ¨ becomes Ä.

See also


Solution

  • Sorry, no, there is no such function in any version of SQL Server to date (2012 test builds). Comparisons can be correctly composition-insensitive, but there isn't a function to convert character composition usage to one normal form.

    It has been suggested for a future version of the ANSI standard under the syntax NORMALIZE(string, NFC) but it's going to be a long time before this makes it to the real world. For now if you want to do normalisation you'll have to do it in a proper programming language with better string-handling capabilities, either by pulling the string out of the database or by writing a CLR stored procedure to do it.