Search code examples
pythoncpythonpypy

pypy pickled cannot be unpickled by cpython


I have a piece of code which pickled by pypy, however, pypy add its own opcode to extend the cpython opcode, which cannot be unpickled by cpython(raise SystemError: unknown opcode).

It's caused by the special opcode: LOOKUP_METHOD & CALL_METHOD, just refer to the pypy doc

I am wondering how to make pypy generate exactly the standard cpython bytecode instead of the self defined bytecode. I looked around the docs, and found the PYTHONOPTIMIZE environment variable, and I set it to 0, but it did not work.

p.s. I cannot change the unpickle side, it has to be cpython 2.7.

update 1

As comment says, the cpython cannot pickle or unpickle code object, it's right. I am using the cloudpickle library to pickle and unpickle function object, and in the cloudpickle library, the code object can be pickled.

The problem is that the co_code property is different in pypy, it contains the special opcode which only defined in pypy.

update 2

I adopt the method provided by @ecatmur , it works perfect except for BUILD_LIST_FROM_ARG.

It is my code below:

class my_func(object):
    def __init__(self, resources):
        self.file_resource = resources[0]
        self.table_resource = resources[1]

        self.valid_ids = [int(l) for l in self.file_resource]
        self.valid_ids.extend([int(l[0]) for l in self.table_resource]) # issue line

After cloudpickle which is modifed on the pypy side, I unpikle on the cpython side:

c = pickle.loads('**the pypy pickled code**')
c([['0'], [['1']]])

but the error raises:

in __init__(self, resources)
    453 
    454                 self.valid_ids = [int(l) for l in self.file_resource]
--> 455                 self.valid_ids.extend([int(l[0]) for l in self.table_resource])
    456 
    457             def __call__(self, arg):

TypeError: 'int' object has no attribute '__getitem__'

I checked the bytecode by dis.dis, It's so wired, it seems quite right.

If I pickle by cpython, the unpickle side works right.

Any idea about the update 2?


Solution

  • There aren't any options to disable the LOOKUP_METHOD optimization; you could try disabling astcompiler.PythonCodeGenerator._optimize_method_call() but I think it would be safer to patch the bytecode as you pickle it. Fortunately this is easy as the opcodes take the same arguments and appear in corresponding positions:

    from cloudpickle import CloudPickler, PY3
    import opcode
    
    HAVE_ARGUMENT = opcode.HAVE_ARGUMENT
    NOP = opcode.opmap['NOP']
    LOOKUP_METHOD = opcode.opmap['LOOKUP_METHOD']
    CALL_METHOD = opcode.opmap['CALL_METHOD']
    LOAD_ATTR = opcode.opmap['LOAD_ATTR']
    CALL_FUNCTION = opcode.opmap['CALL_FUNCTION']
    BUILD_LIST_FROM_ARG = opcode.opmap['BUILD_LIST_FROM_ARG']
    BUILD_LIST = opcode.opmap['BUILD_LIST']
    ROT_TWO = opcode.opmap['ROT_TWO']
    JUMP_IF_NOT_DEBUG = opcode.opmap['JUMP_IF_NOT_DEBUG']
    JUMP_FORWARD = opcode.opmap['JUMP_FORWARD']
    JUMP_ABSOLUTE = opcode.opmap['JUMP_ABSOLUTE']
    
    def pypy_to_cpython(code):
        code = [ord(c) for c in code]
        i = 0
        while i < len(code):
            if code[i] == LOOKUP_METHOD:
                code[i] = LOAD_ATTR
            elif code[i] == CALL_METHOD:
                code[i] = CALL_FUNCTION
            elif code[i] == BUILD_LIST_FROM_ARG:
                code[i:i + 3] = [JUMP_ABSOLUTE, len(code) % 256, len(code) // 256]
                code.extend([BUILD_LIST, 0, 0, ROT_TWO,
                    JUMP_ABSOLUTE, (i + 3) % 256, (i + 3) // 256])
            elif code[i] == JUMP_IF_NOT_DEBUG:
                if __debug__:
                    code[i:i + 3] = [NOP, NOP, NOP]
                else:
                    code[i] = JUMP_FORWARD
            i += (3 if code[i] >= HAVE_ARGUMENT else 1)
        return ''.join(chr(c) for c in code)
    

    Note: there's also BUILD_LIST_FROM_ARG and JUMP_IF_NOT_DEBUG. The former is equivalent to BUILD_LIST(0) followed by ROT_TWO, while the latter is equivalent to a no-op in debug mode, and to JUMP_FORWARD when not in debug mode. The tricky bit here is avoiding the need to recalculate bytecode positions for absolute jumps and line numbers; the fix is to append any longer bytecode strings to the end of the function, then jump there and jump back.

    Then subclass (or monkey-patch) cloudpickle.CloudPickler to call your opcode patcher:

    class MyPickler(CloudPickler):
        dispatch = CloudPickler.dispatch.copy()
    
        def save_codeobject(self, obj):
            """
            Save a code object
            """
            if PY3:
                args = (
                    obj.co_argcount, obj.co_kwonlyargcount, obj.co_nlocals, obj.co_stacksize,
                    obj.co_flags, pypy_to_cpython(obj.co_code), obj.co_consts, obj.co_names, obj.co_varnames,
                    obj.co_filename, obj.co_name, obj.co_firstlineno, obj.co_lnotab, obj.co_freevars,
                    obj.co_cellvars
                )
            else:
                args = (
                    obj.co_argcount, obj.co_nlocals, obj.co_stacksize, obj.co_flags, pypy_to_cpython(obj.co_code),
                    obj.co_consts, obj.co_names, obj.co_varnames, obj.co_filename, obj.co_name,
                    obj.co_firstlineno, obj.co_lnotab, obj.co_freevars, obj.co_cellvars
                )
            self.save_reduce(types.CodeType, args, obj=obj)
        dispatch[types.CodeType] = save_codeobject