Search code examples
pythonpython-3.xhexhexdumphex-editors

Python: Edit specific hex values in a file


I'm trying to edit a specific line of data in a .m4a file (audio file) but I can't figure out a way to do it in python. I know there's other similar threads to this but when I open a .m4a file in a hex editor program (HxD for example) it gives me different hex data than what I get from my python script. I'm a little confused by the terminology. What I need to do is read the file with python and convert it to the format my hex editor uses replace the data then convert it back and write it to the file. I don't really know if that's possible or if there's a more simple way of doing it. I'm still new to Python so I'm still learning. I really just need someone to point me in the right direction. The reason for doing this is related to the file's metadata that I'm trying to change.

My python version: Python 3.7.4

Here's a link to the file in question: https://drive.google.com/file/d/1m8SpCLSyX265_I00MFT1IyltpTAvxntF/view?usp=sharing

My code:

with open(file, 'rb') as f:
    content = f.read().hex()
print(content)

The following is the line I need to edit (from my hex editor)

00 00 01 80 68 69 33 32

(text translation: ����hi32)

replace with:

00 00 00 00 68 69 33 32

The beginning of my file in a hex editor looks like this (HxD):

00 00 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 01 0C 60 6D 64 69 61 00 00 00 20 6D 64 68 64 00 00 00 00 D9 98 96 40 D9 B2 F7 52 00 00 AC 44 00 84 EC 00 00 00 00 00 00 00 00 22 68 64 6C 72 00 00 00 00 00 00 00 00 73 6F 75 6E 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 0C 16 6D 69 6E 66 00 00 00 10 73 6D 68 64 00 00 00 00 00 00 00 00 00 00 00 24 64 69 6E 66 00 00 00 1C 64 72 65 66 00 00 00 00 00 00 00 01 00 00 00 0C 75 72 6C 20 00 00 00 01 00 01 0B DA 73 74 62 6C 00 00 80 76 73 74 73 64 00 00 00 00 00 00 00 01 00 00 80 66 6D 70 34 61 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 10 00 00 00 00 AC 44 00 00 00 00 00 33 65 73 64 73 00 00 00 00 03 80 80 80 22 00 00 00 04 80 80 80 14 40 15 00 18 00 00 04 82 90 00 03 E8 00 05 80 80 80 02 12 10 06 80 80 80 01 02 00 00 00

The beginning of the hex I get from my Python script looks like this:

d5df0d0ef02daf279fd6b15fae5c6e0bc79bec22095ceeada5e77371afc8ee36f10773b1b2c06b1b1ee4e5cccbf67403b26fd37cc6e3cc9f11019ab604f0071872ec6c092cc20b2a6d4460c55986623b50

Solution

  • Differences in Reading Hex

    when I open a .m4a file in a hex editor program (HxD for example) it gives me different hex data than what I get from my python script.

    Reading with Python

    This is what I see in python, showing the first 32 characters:

    with open('01 Choir (Remix).m4a', 'rb') as f:
        content = f.read().hex()
    print(content[:32])
    00000020667479704d34412000000000
    

    Reading with xxd

    Using bash, again selecting the first 32 chars:

    $ xxd -ps 01\ Choir\ \(Remix\).m4a | head -c 32
    00000020667479704d34412000000000
    

    Here xxd -ps gets the hexstring of a file and head takes the first 32 characters of this output.

    Note that they are the same hex.

    Rewriting Hex

    The following is the line I need to edit (from my hex editor)

    0000018068693332

    replace with:

    0000000068693332

    You had half of the solution - just string replace and rewrite to the file. Keep in mind that while Python's regex library, re, is more powerful here, it's also not necessary as all you need to do is string replacement. And string replacement is an order of magnitude faster than using a regex.

    If you do need to use a regex though, there are plenty of ways to Edit Hex.

    # replace_bytes.py
    source_str = '0000018068693332'
    replace_str = '0000000068693332'
    
    with open('01 Choir (Remix).m4a', 'rb') as f:
        content = f.read().hex()
    print(source_str + " in `01 Choir (Remix).m4a`:       ", source_str in content)
    content = content.replace(source_str, replace_str)
    with open('01 Choir (Remix) edited.m4a', 'wb') as f:
        f.write(bytes.fromhex(content))
    
    with open('01 Choir (Remix) edited.m4a', 'rb') as f:
        new_content = f.read().hex()
    print(source_str + " in `01 Choir (Remix) edited.m4a`:", source_str in new_content)
    

    Then running it:

    $ python replace_bytes.py
    0000018068693332 in `01 Choir (Remix).m4a`:        True
    0000018068693332 in `01 Choir (Remix) edited.m4a`: False