Search code examples
python-3.xdictionarycopyclonemetaclass

How to clone / deepcopy Python 3.x dict with internal references to the objects


I have a following problem. Let's say we have class A and class B:

class A:

    def clone(self):

        return self.__class__()

class B:

    def __init__(self, ref):

        self.ref = ref

    def clone(self):

        return self.__class__(
            ref = self.ref
        )

I also have a class that inherits after dict called Holder.

class Holder(dict):

    def clone(self):

        return self.__class__(
            {k: v.clone() for k, v in self.items()}
        )

Now what I want is to have some way of copying the whole dict (with values already put inside) with my clone() function in the way that the references don't get messed up.

And here's some code that should clarify the behaviour that I want:

original = Holder()
original['a'] = A()
original['b'] = B(original['a'])  # here we create B object 
                                  # with reference to A object

assert original['a'] is original['b'].ref  # reference is working

copy = original.clone()  # we clone our dict

assert copy['a'] is copy['b'].ref  # reference is not working like I want
                                   # copy['b'].ref points to old original['b']

assert original['a'] is not copy['a']
assert original['b'] is not copy['b']
assert original['b'].ref is not copy['b'].ref

Here's some background to the problem described below:

Let's say that I have a class called MyClass and metaclass called MyClassMeta.

I want to supply the __prepare__ function of MyClassMeta with my own dict that will be the instance of class called Holder. During the class creation I will be storing values of certain types to the internal dict of Holder instance (similarly to what EnumMeta does). Since the Holder instance will be filled with values during the class creation all instances of MyClass will be have a reference to the same object.

Now what I want is to have the separate copy per instance of my Holder. I thought that I can just copy/clone my object but the problem came up when I added object that referenced other object inside the same dict.


Solution

  • The correct way to clone custom data structures in Python is to implement the __deepcopy__ special method. This is what is called by the copy.deepcopy function.

    As explained in the doc:

    Two problems often exist with deep copy operations that don’t exist with shallow copy operations:

    • Recursive objects (compound objects that, directly or indirectly, contain a reference to themselves) may cause a recursive loop.
    • Because deep copy copies everything it may copy too much, such as data which is intended to be shared between copies. [This is the problem you are facing]

    The deepcopy() function avoids these problems by:

    • keeping a “memo” dictionary of objects already copied during the current copying pass; and
    • letting user-defined classes override the copying operation or the set of components copied.

    Code

    import copy
    
    class A:
        def __deepcopy__(self, memo):
            return self.__class__()
    
    class B:
        def __init__(self, ref):
            self.ref = ref
    
        def __deepcopy__(self, memo):
            return self.__class__(
                ref=copy.deepcopy(self.ref, memo)
            )
    
    class Holder(dict):
        def __deepcopy__(self, memo):
            return self.__class__(
                {k: copy.deepcopy(v, memo) for k, v in self.items()}
            )
    

    Test

    import copy
    
    original = Holder()
    original['a'] = A()
    original['b'] = B(original['a'])  # here we create B object
                                      # with reference to A object
    
    assert original['a'] is original['b'].ref  # reference is working
    
    cp = copy.deepcopy(original)  # we clone our dict
    
    assert cp['a'] is cp['b'].ref  # reference is still working
    
    assert original['a'] is not cp['a']
    assert original['b'] is not cp['b']
    assert original['b'].ref is not cp['b'].ref