I am writing custom classes that inherit either from the Python class dict
, or collections.Counter
and I am facing problem with the behaviour of deepcopy
. The problem is basically that deepcopy
works as intended when inheriting from dict
but not from Counter
.
Here is an example:
from copy import deepcopy
from collections import Counter
class MyCounter(Counter):
def __init__(self, foo):
self.foo = foo
class MyDict(dict):
def __init__(self, foo):
self.foo = foo
c = MyCounter(0)
assert c.foo == 0 # Success
c1 = deepcopy(c)
assert c1.foo == 0 # Failure
d = MyDict(0)
assert d.foo == 0 # Success
d1 = deepcopy(d)
assert d1.foo == 0 # Success
I am a bit clueless as to why this is happening given that the source code of the Counter
class does not seem to change anything about the deepcopy (no custom __deepcopy__
method for instance).
I understand that I may have to write a custom __deepcopy__
method but it's not clear to me how to. In general I would rather not have to do that given that it works perfectly for dict
.
Any help will be much appreciated.
deepcopy
has several fallbacks, covered in the answer here
In this case, your particular base class Counter
specialize pickle serialization, which is what deepcopy
will pick up on (as the second option, as no specialization for __deepcopy__
happens to exist).
If you step through the code in a debugger, you'll find it ends up at Counter's __reduce__
method, where the python 3.9 implementation of Counter
has:
def __reduce__(self):
return self.__class__, (dict(self),)
where we see can see where information is lost as Counter
's implementation here relies on that there aren't any other fields stored in this object other than the dictionary part itself.
You could overload __reduce__
or __reduce_ex__
, which would fix pickling and as a bonus also fix deepcopy, or you could overload __deepcopy__
and provide the necessary implementation for it.
Implementing our own deepcopy isn't to complex, and we can keep the code very simple:
class MyCounter(Counter):
def __init__(self, foo):
self.foo = foo
def __deepcopy__(self, memo):
copy_instance = MyCounter(deepcopy(self.foo, memo))
for key, val in self.items():
copy_instance[deepcopy(key, memo)] = val # val is just an int
return copy_instance
c = MyCounter(123)
c['deep'] = 1
c['copy'] = 2
c1 = deepcopy(c)
assert c1.foo == c.foo
assert c1['deep'] == c['deep']
assert c1['copy'] == c['copy']
(In most cases I would probably recommend against overloading Counter or dict in order to add more attributes to them, but rather compose a custom class that has a counter or dict instance variable instead.)