Search code examples
mysqlsql-servermd5farsihashbytes

hashbytes equivalent in mysql when working with none english text


When I am working with none-English characters the generated hash is not the same but otherwise, everything is fine. does anyone have a solution to get the same result?

MYSQL

  SELECT  MD5( 'سلام')
------------- result: 78903c575b0dda53c4a7644a2dd36d0e

SQL-Server

SELECT CONVERT(VARCHAR(50), HASHBYTES('MD5',  'سلام'), 2) 
-------------- result:3C50F6458899B3C0988BE358290F5F24


SELECT CONVERT(nVARCHAR(50), HASHBYTES('MD5',  N'سلام'), 2) 
-------------- result:0381CA5081FBC68B2F55F2F2C21399D7


Solution

  • Based on MySQL returning D8B3D984D8A7D985 from SELECT HEX(CAST('سلام' as binary)) it appears that MySQL is using UTF-8 character encoding.

    On SQL Server 2019 you can use the LATIN1_GENERAL_100_CI_AS_SC_UTF8 collation (it's not supported on SQL Server 2017) such as the following:

    create table #Test (
      UTF16 nvarchar(max),
      UTF8 varchar(max)  COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8
    )
    insert #Test values (N'سلام', N'سلام');
    
    select UTF16 from #Test;
    select CAST(UTF16 as varbinary) as [UTF16-Bytes] from #Test;
    select UTF8 from #Test;
    select CAST(UTF8 as varbinary) as [UTF8-Bytes] from #Test;
    

    Which returns:

    UTF16
    سلام
    
    UTF16-Bytes
    0x3306440627064506
    
    UTF8
    سلام
    
    UTF8-Bytes
    0xD8B3D984D8A7D985
    

    And then with hashbytes():

    select hashbytes('MD5', cast(UTF16 as varbinary)) as [UTF16-Hash] from #Test;
    select hashbytes('MD5', cast(UTF8 as varbinary)) as [UTF8-Hash] from #Test;
    

    Which returns:

    UTF16-Hash
    0x0381CA5081FBC68B2F55F2F2C21399D7
    
    UTF8-Hash
    0x78903C575B0DDA53C4A7644A2DD36D0E
    

    Hope this helps!