Search code examples
pythonnanpython-dataclasses

Testing dataclass objects for equality with a NaN attribute


Is there an easy way handle NaN attributes when testing dataclass objects for equality? Here is my minimal example:

import pickle
from dataclasses import dataclass


@dataclass
class MyClass:
    a: float


mc = MyClass(float('nan'))

# Serialize and deserialize
mc2 = pickle.loads(pickle.dumps(mc))

assert mc2 == mc  # E   assert MyClass(a=nan) == MyClass(a=nan)

Current errors as follows:

Traceback (most recent call last):
  File "???.py", line 15, in <module>
    assert mc2 == mc  # E   assert MyClass(a=nan) == MyClass(a=nan)
AssertionError

Solution

  • If you need to override the custom equality logic for a float then you're starting to move away from what dataclasses were designed for. It's only a small move, so dataclasses can still give you a bit of flexibility.

    Dataclasses will honour any custom logic you write for your class. So you are completely free to write your own implementation of __eq__ that does what you want. For instance:

    from dataclasses import dataclass
    from math import isnan
    
    @dataclass
    class MyClass:
        a: float
        b: float
    
        def __eq__(self, other):
            return (
                self.__class__ is other.__class__
                and self.a == other.a
                and (
                    self.b == other.b
                    or (isnan(self.b) and isnan(other.b))
                )
            )
    
    same_1 = MyClass(1, 2.5)
    same_2 = MyClass(1, 2.5)
    different_a = MyClass(2, 2.5)
    different_b = MyClass(1, 3.0)
    nan_1 = MyClass(1, float('nan'))
    nan_2 = MyClass(1, float('nan'))
    
    assert nan_1 is not nan_2 and nan_1.b is not nan_2.b, \
        "check equality operator cannot be short-circuited due to object identity"
    assert nan_1 == nan_2, "equal when both bs are nan"
    
    assert same_1 == same_1, "same object"
    assert same_1 == same_2, "different object, but equal attributes"
    assert same_1 != different_a, "different a attribute"
    assert same_1 != different_b, "different b attribute"
    assert same_1 != nan_1, "preserve nan inequality with other numbers"
    

    You could play about with meta programming to reduce the amount of code in your class. However, if you provide a custom equality operator then your code will remain clear. It will be easy to see the reason you have implemented a custom equality operator, and how it works. Plus it will be marginally more efficient -- no transient object creation or additional instance checks required.