I have a hierarchy of frozen dataclasses, and I need to implement hashing for these dataclasses such that the hashes of every unique instance across the whole hierarchy are unique. I am defining "unique" here to mean that either the fields or the type of two instances differ. However, since the default dataclass __hash__
is a function only of the dataclass fields and not of the type, instances of different dataclass types which share the same fields hash to the same value by default.
Below is a 3-level nested dataclass.
from dataclasses import dataclass
import abc
@dataclass(frozen=True)
class Element(abc.ABC):
pass
@dataclass(frozen=True)
class StepType(Element, abc.ABC):
@classmethod
def name(cls):
return cls.__name__
class Skip(StepType): pass
class Hop(StepType): pass
@dataclass(frozen=True)
class Stepper(Element, abc.ABC):
step_type: StepType = Skip()
foo: int = 1
@abc.abstractmethod
def step(self):
pass
@dataclass(frozen=True)
class Single(Stepper):
def step(self):
return self.step_type.name() + " once"
@dataclass(frozen=True)
class Double(Stepper):
def step(self):
return self.step_type.name() + " twice"
@dataclass(frozen=True)
class Speed(Element, abc.ABC):
@abc.abstractmethod
def how_fast(self):
pass
@dataclass(frozen=True)
class Slow(Speed):
def how_fast(self):
return "real slow"
@dataclass(frozen=True)
class Fast(Speed):
def how_fast(self):
return "quickly"
@dataclass(frozen=True)
class Walker:
speed: Speed = Slow()
stepper: Stepper = Single()
def walk(self):
return " ".join([self.stepper.step(), self.speed.how_fast()])
Here's summary of the inheritance structure of Element
.
Element
|- StepType
| |- Skip
| |- Hop
|
|- Stepper
| |- Single
| |- Double
|
|- Speed
|- Slow
|- Fast
Here's summary of the nested fields structure of Walker
.
Walker
|- stepper: Stepper
| |- step_type: StepType
|
|- speed: Speed
Different instances of Walker
and its Element
instances hash to the same value.
a = Walker(speed=Slow(), stepper=Single())
b = Walker(speed=Fast(), stepper=Double(step_type=Hop()))
print(f"a: {hash(a)}\nb: {hash(b)}")
>> a: -5704360693866892300
b: -5704360693866892300
I would like the hashes of unique Walker
instances to be reliably different and maintain functionality for various dataclass features.
__hash__
, it could be via creating new methods in Walker
and/or Element
. But it must be recursive since Element
instances can contain other Element
s to arbitrary depths. Ideally, I'd like to leverage the reliable default dataclass __hash__
rather than overriding it completely.id(self)
like object.__hash__
does.Walker
must inherit from Element
. I cannot refactor any of the dataclasses into regular classes, Enums, etc. I must use the dataclass framework.I think it would work well to add repr(type(self))
to the hash function as well as the fields.
My first idea was something like:
class Element:
def hash(self):
return hash(self) ^ hash(repr(type(self)))
But this doesn't recurse down to nested Element
s.
This is the best idea I have at the time of posting.
class Element:
def __hash__(self):
return hash(
hash(repr(type(self))) ^
(hash(
hash((key, val)) for key, val in self.__dict__.items()
)**3
)
)
This does recurse, and it passes my unit tests, but it overrides the built-in dataclass __hash__
. I'm worried this could create some broken corner cases. Any input on a more robust method, pointing out where this would break, or a simple thumbs up is appreciated.
One approach to reusing dataclass' built-in hash function while adding the current class to the hash is to define a hidden field (hidden with init=False
and repr=False
) with the current class as the default value. This can be done by setting the field in the __init_subclass__
method of the base class, with the accompanying annotation of the hidden field set in the __annotations__
attribute:
class Element:
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._type = field(init=False, repr=False, default=cls)
cls.__annotations__['_type'] = type
Demo here