I'm sending over a gzipped string from C# (using SharpZipLib) to PHP where I decompress with readgzfile. This works, however each character in the string is followed by two strange characters (using vim in the console those are displayed as ^@
). I also tried with gzopen/gzread but with the same results.
When I clean the non-ASCII characters from the string with $clean= preg_replace('/[^(\x20-\x7F)]*/','', $string);
the $clean string is identical to the one in C#.
While this works, I would like to know what is happening and why so I can make sure this will always work or come up with a better solution.
Given that the string is created on Windows, it's likely that some multibyte encoding is being used.
You can verify this yourself by using bin2hex($string)
and check the hexadecimal representation instead of relying on vim.
If either UTF-16
or UCS2
are being used, you can convert them like so:
// iconv($from, $to, $str)
$clean = iconv('UTF-16', 'UTF-8', $string);