Search code examples
pythoncopynew-operatorpython-internals

Unable to deepcopy a class with both __init__ and __new__ defined


I'm having (what seems to me) a slightly weird problem. I have defined a class with both init and new defined, below:

class Test:

    def __init__(self, num1):
        self.num1 = num1

    def __new__(cls, *args, **kwargs):
        new_inst = object.__new__(cls)
        new_inst.__init__(*args, **kwargs)
        new_inst.extra = 2
        return new_inst

If put to normal use, this works fine:

test = Test(1)
assert test.extra == 2

However, it will not copy.deepcopy:

import copy
copy.deepcopy(test)

gives

TypeError: __init__() missing 1 required positional argument: 'num1'

This may be related to Decorating class with class wrapper and __new__ - I can't see exactly how but I'm trying a similar thing here - I need new to apply a class wrapper to the Test instance I've created.

Any help gratefully received!


Solution

  • Technically it's not an issue to call __init__ from __new__, but it's redundant as a call to __init__ happens automatically once __new__ returns the instance.


    Now coming to why deepcopy fails, we can look into its internals a bit.

    When __deepcopy__ isn't defined on the class it falls to this condition:

    reductor = getattr(x, "__reduce_ex__", None)
    rv = reductor(4)
    

    Now, here reductor(4) returns the function to be used to re-create the object, the type of the object(Test), arguments to be passed and its state(in this case the items in instance dictionary test.__dict__):

    >>> !rv
    (
        <function __newobj__ at 0x7f491938f1e0>,  # func
        (<class '__main__.Test'>,),  # type + args in a single tuple
        {'num1': 1, 'extra': []}, None, None) # state
    

    Now it calls _reconstruct with this data:

    def _reconstruct(x, memo, func, args,
                     state=None, listiter=None, dictiter=None,
                     deepcopy=deepcopy):
        deep = memo is not None
        if deep and args:
            args = (deepcopy(arg, memo) for arg in args)
        y = func(*args)
        ...
    

    Here this call will end up calling:

    def __newobj__(cls, *args):
        return cls.__new__(cls, *args)
    

    But since args is empty and cls being <class '__main__.Test'>, you get the error.


    Now how does Python decides these arguments for your object, as that seem to be the problem?

    For that we need to look into: reductor(4), where reductor is __reduce_ex__ and the 4 passed here is pickle protocol version.

    Now this __reduce_ex__ internally calls reduce_newobj to get the object creation function, arguments, state etc for the new copy to be made.

    The arguments in itself are found out using _PyObject_GetNewArguments.

    Now this function looks for __getnewargs_ex__ or __getnewargs__ on the class, since our class doesn't have it, we get nothing for arguments.


    Now let's add this method and try again:

    import copy
    
    
    class Test:
    
        def __init__(self, num1):
            self.num1 = num1
    
        def __getnewargs__(self):
            return ('Eggs',)
    
        def __new__(cls, *args, **kwargs):
            print(args)
            new_inst = object.__new__(cls)
            new_inst.__init__(*args, **kwargs)
            new_inst.extra = []
            return new_inst
    
    test = Test([])
    
    xx = copy.deepcopy(test)
    
    print(xx.num1, test.num1, id(xx.num1), id(test.num1))
    
    # ([],)
    # ('Eggs',)
    # [] [] 139725263987016 139725265534088
    

    Surprisingly the deepcopy xx doesn't have Eggs stored in num1 even though we're returning it from __getnewargs__. This is because the function _reconstruct re-adds a deepcopy of the state it obtained originally to the instance after its creation, hence overriding these changes.

    
    def _reconstruct(x, memo, func, args,
                     state=None, listiter=None, dictiter=None,
                     deepcopy=deepcopy):
        deep = memo is not None
        if deep and args:
            args = (deepcopy(arg, memo) for arg in args)
        y = func(*args)
        if deep:
            memo[id(x)] = y
    
        if state is not None:
            ...
                if state is not None:
                    y.__dict__.update(state)  <---
        ...
    

    Are there any other ways to do it?

    Note the above explanation and the working function is just for explaining the issue. I wouldn't really call it the best or worse way to do it.

    Yes, you could define you own __deepcopy__ hook on the class to control the behavior further. I'd leave this an exercise to the user.