Search code examples
pythoncompilationmarshalling

compile python code to PyCodeObject and marshal to pyc file


I try to compile python code to PyCodeObject and marshal to pyc file. But when I import my pyc file, it fails and the traceback is "ValueError: bad marshal data (unknown type code)" Here is my code

import struct
import marshal
import time
import imp 
import sys 

def dump(co, filename):
    magic = imp.get_magic()
    time_string = struct.pack('L', int(time.time()))
    f = open("merge.pyc", "wb")
    f.write(magic)
    f.write(time_string)
    marshal.dump(dco, f)
    f.close()

demo = open("demo.py").read()
dco = compile(demo, "demo.py", "exec")
dump(dco, "dco.pyc")

Solution

  • I guess the ValueError raised is because of the mismatched type of timestamp. You should make sure the fmt parameter (L used, which standard size is 4 bytes) in struct.pack has the same length with your python executable.

    Since the value of timestamp do not effect on the final result, making the type suitable is enough here. Try to use Q instead of L to ensure 8 bytes if you are using 64bit version of CPython.

    === UPDATED BEGIN ===

    Sorry, I made a serious mistake in the previous description.

    L without any prefix, means native, and the size and byte order is the same to the machine’s native format and byte order (In 64bit CPython, it would be 8 bytes). However the timestamp written to pyc file is always be 4 bytes (little endian) in both Python 2 and 3.

    If the used is Python2, we should use <L instead of L, which has standard size, 4 bytes.

    If the used is Python3, in additional to timestamp, we should add another 4 bytes field, source size, as the following code shown (importlib/_bootstrap_external.py, Python3.6.4)

    def _code_to_bytecode(code, mtime=0, source_size=0):
        """Compile a code object into bytecode for writing out to a byte-compiled
        file."""
        data = bytearray(MAGIC_NUMBER)
        data.extend(_w_long(mtime))
        data.extend(_w_long(source_size))
        data.extend(marshal.dumps(code))
        return data
    

    At last, you could refer to pkgutil.read_code(), which is the reverse order of writing code.

    === UPDATED END ===

    By the way, there are some defects in your code: Have not use the co and filename parameters in dump function