I have stumbled upon an interesting piece of code written in Python:
from struct import pack
chars = [109, 0, 97, 0, 110, 0, 105, 0, 102, 0, 101, 0, 115, 0, 116, 0]
length = 16
data = ""
for i in range(0, length):
ch = pack("=b", chars[i])
data += unicode(ch, errors='ignore')
if data[-2:] == "\x00\x00":
break
end = data.find("\x00\x00")
if end != -1:
data = data[:end]
print(len(data.decode("utf-16", "replace"))) // outputs 8, string is 'manifest'
As you can see, Python does decode utf-16
properly.
However, when I try to port the code to PHP I get bad results:
$chars = array(109, 0, 97, 0, 110, 0, 105, 0, 102, 0, 101, 0, 115, 0, 116, 0);
$length = 16;
$data = "";
for ($i = 0; $i < $length; $i++) {
$data .= pack("c", $chars[$i]);
if (substr($data, -2) == "\x00\x00") {
break;
}
}
$end = strpos($data, "\x00\x00");
if ($end !== false) {
$data = substr($data, 0, $end);
}
// md_convert_encoding() doesn't seem to work
printf(strlen($data)); // outputs 16
The only solution I see is to just give up on the UTF magic and change the loop to:
for ($i = 0; $i < $length; $i+=2)
Is there anything I can do about this, or just use the modified for loop?
Thank you.
First of all take a look at How can I convert array of bytes to a string in PHP?.
Using that solution you would convert your byte array to a string like
$chars = array(109, 0, 97, 0, 110, 0, 105, 0, 102, 0, 101, 0, 115, 0, 116, 0);
$str = call_user_func_array("pack", array_merge(array("C*"), $chars));
$convertedStr = iconv('utf-16', 'utf-8', $str);
var_dump($str);
var_dump($convertedStr);
Executing this script outputs
string(16) "manifest"
string(8) "manifest"