Search code examples
pythoninheritancehashpython-dataclasses

How to override the hash function of python data classes?


I am trying to write a base class for python dataclasse with a custom hash function as follows. However, when calling the child class's hash it does not use the custom hash function of the parent class.

import dataclasses
import joblib


@dataclasses.dataclass(frozen=True)
class HashableDataclass:

    def __hash__(self):
        print("Base class hash was called!")
        fields = dataclasses.fields(self)
        values = tuple(getattr(self, field.name) for field in fields)
        return int(joblib.hash(values), 16)


@dataclasses.dataclass(frozen=True)
class MyDataClass1(HashableDataclass):
    field1: int
    field2: str


obj1 = MyDataClass1(1, "Hello")
print(hash(obj1))

Is there a way to override hash function of data classes?


Solution

  • You should check the documentation:

    If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

    @dataclasses.dataclass(frozen=True, eq=False)  # <- HERE
    class MyDataClass1(HashableDataclass):
        field1: int
        field2: str
    

    Output:

    >>> obj1 = MyDataClass1(1, "Hello")
    Base class hash was called!
    1356025966893372872
    

    According the comment of @user2357112, you can/should use (see reasons in comments)

    @dataclasses.dataclass(frozen=True)
    class MyDataClass1(HashableDataclass):
        __hash__ = HashableDataclass.__hash__  # <- HERE
        field1: int
        field2: str