Search code examples
compilationreverse-engineering

Why strings are stored in the following way in PE file


I opened a .exe file and I found a string "Premium" was stored in the following way

50 00 72 00 65 00 6D 00 69 00 75 00 6D 00

I just don't know why "00" is appended to each of characters and what's its usage.

Thanks,


Solution

  • It's probably a UTF-16 encoding of a Unicode string. Here's an example using Python:

    >>> u"Premium".encode("utf16")
    '\xff\xfeP\x00r\x00e\x00m\x00i\x00u\x00m\x00'
    #        ^    ^    ^    ^    ^    ^    ^   
    

    After the byte marker to indicate endianness, you can see alternating letters and null bytes.


    \xff\xfe is the byte-order marker; it indicates that the low-order byte of each 16-bit value comes first. (If the high-order byte came first, the byte marker would be \xfe\xff; there's nothing particularly meaningful about which marker means which.)

    Each character is then encoded as a 16-bit value. For many values, the UTF-16 encoding is simply the straightforward unsigned 16-bit representation of its Unicode code point. Specifically, 8-bit ASCII values simply use a null byte as the high-order byte, and its ASCII value as the low-order byte.