Search code examples
phpsql-serverpdoodbcnvarchar

How to use PHP PDO ODBC with SQL Server and Unicode Characters?


PHP PDO ODBC doesn't seem to be able to store NVARCHAR characters (UTF-16) without some manual encoding. This seems like a pretty commun "bug" all around the internet and no one seems to have a definitive solution.

How to reproduce the bug

  1. Try to insert the following japanese characters using PDO : こんにちは (It means hello)
  2. The following will be stored in your database : ã“ã‚“ã«ã¡ã¯
  3. Then retrieve it via PDO and print it on the screen, you will get : こんにちは

It's not so bad but it's not good either. PHP works but when you have other applications that are not in PHP that access that information from your database they will get the wrong string : ã“ã‚“ã«ã¡ã¯.

Ideally you would want to have こんにちは everywhere.

The symptom

It seems that PDO doesn't have any notion of NVARCHAR, aka characters that are encoded using 16 bits. In fact, everything that you pass to or retrieve from SQL SERVER via PDO will be by chunk of 8 bits. How to "prove" it? Here it is :

  1. You start with your japanese string こんにちは.

You first have to know that PHP consider string as binary and that (if it's set like that) it will store them in UTF-8.

So if we look at the binary representation of こんにちは, you will get E38193E38293E381ABE381A1E381AF which is also the binary representation that SQL SERVER will give you for ã“ã‚“ã«ã¡ã¯. (depends on your collation)

  1. Next, let's put it into UTF-16 because that's the format of NVARCHAR.

    $utf16_string = mb_convert_encoding('こんにちは', 'UTF-16LE');

The following will change the binary representation in PHP of こんにちは to 533093306B3061306F30 which is exactly the binary representation of こんにちは in SQL SERVER NVARCHAR.

  1. Next try to save it in SQL via PDO and you will get the following : S0“0k0a0o0

The SQL SERVER binary representation of S0“0k0a0o0 in VARCHAR is 533093306B3061306F30 which is also the binary representation of こんにちは in NVARCHAR.

A dirty solution

You can use the following to save and retrieve unicode data in SQL SERVER via PDO ODBC but it's ugly...

  1. You want to transform the data into exactly the same binary representation that SQL SERVER NVARCHAR will store it

    mb_convert_encoding('こんにちは', 'UTF-16LE');

  2. You want to receive it as a binary on SQL SERVER side and then transform it to NVARCHAR.

    @binary VARBINARY(40) SELECT @string = CONVERT(NVARCHAR(20), @binary);

  3. At this point you have こんにちは in your database. To retrieve it you want to resend it to PHP as a binary

  4. Once you get the binary in PHP, PHP will already have transform it into a hex string... So, you want to convert the hex string into a binary and then change the encoding from utf-16 to utf-8

    $result = mb_convert_encoding(hex2bin($string), 'UTF-8', 'UTF-16LE');

And you will be back with your こんにちは when you echo it to your webpage.

Basically, that's what the SQL driver should be doing for me instead of me doing it manually.

Did I forgot to configure something or I have to do it manually?


Solution

  • No, I don't think you forgot to configure anything. In fact, your explanation is the best I've found so far regarding the long-standing "issues" between PHP and Microsoft ODBC drivers. Those issues are especially puzzling given that the PDO_ODBC page says:

    On Windows, PDO_ODBC ... is the recommended driver for connecting to Microsoft SQL Server databases.

    However, on Windows they also offer PDO_SQLSRV which actually does appear to work correctly.

    So it seems that PDO_ODBC "doesn't have any notion of NVARCHAR", rather than PDO as a whole.

    (Similar problems arise when trying to use PHP with Microsoft Access ODBC if Unicode characters are involved)

    Conclusion: PHP support for ODBC continues to be a bit of a mess, at least where Microsoft databases are concerned.