Search code examples
pythonpython-3.xembedded-resourceportable-executable

Using Python to overwrite resource section in C program


I have a C program that has a resource section.

IDS_STRING 87 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

In the hex editor it looks like this

enter image description here

I use code such as this in Python to search and replace the A's:

str = b'\x00A'*40
str1 = b"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"

if str in file:
    print("A in file")
    f.write(file.replace(str, str1))

This makes the new file look like this:

enter image description here

So I am wondering why the A's are stored like '41 00' and then when I overwrite them they are just '42'.

Is this a WCHAR thing?

I did a test where I loaded the string and printed it out.

This is some text.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

But then when I used my Python and overwrote the A's with the B's it does this..

This is some text.???????????????????????????????????????B

So with my limited knowledge of C, If I want to put things into the resource section I should place them in as WCHAR?

UPDATE: My main issue with this is I have a hex string similar to below:

'685308358035803507835083408303508350835083508350835083083508'

I want to put that into the resource section. But if I do that similar to the way I am replacing, so by doing

f.write(file.replace(str, '685308358035803507835083408303508350835083508350835083083508'))

Then it puts it into the resource section as:

enter image description here

If it goes in like this, it causes things to break because it is grabbing 2 bytes at a time it seems like.

The reason I am asking this is because when I replace the A's with my hex and run the program. It does not work. But if I place the hex directly into the resource section in Visual Studio and run it, it does work. When I replace with Python it is '34322424...' but when the same string is placed in the resource section is it '3400220042004....'

2nd UPDATE: It seems that the resource section string table does store in a 2 bytes.

https://learn.microsoft.com/en-us/windows/desktop/debug/pe-format#the-rsrc-section

Resource Directory Strings 
Two-byte-aligned Unicode strings, which serve as string data that is pointed to by directory entries. 

Solution

  • It looks like utf-16 encoding. So you can use regular python unicode strings, and make sure you open and write to the file in text mode, and with utf16 encoding.

    If you use binary mode, each ascii character you write will be represented in a single byte. If you use text mode, each character you write will be represented by two bytes. If the text you write is only using low unicode code points, there will be a bunch of null bytes. If you write some Chinese text, you need both bytes.

    The hex dump you posted don't show a BOM at the start, so you might have to use utf-16le instead of utf-16.

    with open('foo.txt', 'r', encoding='utf-16le') as fp:
        text = fp.read()
    
    with open('foo.txt', 'w', encoding='utf-16le') as fp:
        fp.write(text.replace('AAAAAA', 'BBBBBB'))