Search code examples
sql-serverencodingsql-server-2012varcharvarbinary

How to encode language specific chars while converting varbinary() to varchar(max) in SQL Server 2012?


I am trying to convert a database column DATA from varbinary() to varchar(max) in SQL Server 2012.

I am using this code to handle the conversion:

SELECT CONVERT(VARCHAR(MAX), DATA) FROM [dbo].[TABLE_NAME]

and the resulting row is as follows :

VW 6501 Çamaşır

I am having trouble with language specific characters (language is Turkish in my case for now)

How do I get over this encoding problem in SQL Server 2012?

Is there a generic way to do this conversion for any language, considering loss of data/encoding problems for any given language?

This may sound like a rookie question but I really would appreciate any suggestions or answer.

Thank you,


Solution

  • In general, SQL Server does not hold UTF-8 in high regard. However, .NET has methods to do this and you can get at them via CLR integration.

    Compile this using C#:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    
    namespace UtfLib
    {
        public static class UtfMethods
        {
            [SqlFunction(IsDeterministic = true, IsPrecise = true)]
            public static SqlBinary NVarCharToUtf8(SqlString inputText)
            {
                if (inputText.IsNull)
                    return new SqlBinary(); // (null)
    
                return new SqlBinary(Encoding.UTF8.GetBytes(inputText.Value));
            }
    
            [SqlFunction(IsDeterministic = true, IsPrecise = true)]
            public static SqlString Utf8ToNVarChar(SqlBinary inputBytes)
            {
                if (inputBytes.IsNull)
                    return new SqlString(); // (null)
    
                return new SqlString(Encoding.UTF8.GetString(inputBytes.Value));
            }
        }
    }
    

    Import the assembly into your database and create the external functions:

    CREATE ASSEMBLY UtfLib
    FROM 'C:\UtfLib.dll'
    GO
    CREATE FUNCTION NVarCharToUtf8 (@InputText NVARCHAR(MAX))
    RETURNS VARBINARY(MAX)
    AS EXTERNAL NAME UtfLib.[UtfLib.UtfMethods].NVarCharToUtf8
    GO
    CREATE FUNCTION Utf8ToNVarChar (@InputBytes VARBINARY(MAX))
    RETURNS NVARCHAR(MAX)
    AS EXTERNAL NAME UtfLib.[UtfLib.UtfMethods].Utf8ToNVarChar
    

    Last step, you have to enable clr

    sp_configure 'clr enabled',1
    GO
    RECONFIGURE
    GO
    sp_configure 'clr enabled'  -- make sure it took
    GO
    

    and voilà!

    SELECT dbo.Utf8ToNVarChar(DATA) FROM [dbo].[TABLE_NAME]