Search code examples
pythoninheritanceabstractionpython-dataclasses

Inheritance with dataclasses


I am trying to understand what are the good practices when using inheritance with dataclasses. Let's say I want an "abstract" parent class containing a set of variables and methods, and then a series of child classes that inherit these methods and the variables, where in each of them the variables have a different default value.

from dataclasses import dataclass


@dataclass
class ParentClass:
    a_variable: str

    def a_function(self) -> None:
        print("I am a class")

# ONE
@dataclass
class DataclassChild1(ParentClass):
    a_variable: str = "DataclassChild"

# TWO
@dataclass
class DataclassChild2(ParentClass):
    def __init__(self) -> None:
        super().__init__(a_variable="Child")

# THREE
class ClassChild(ParentClass):
    def __init__(self) -> None:
        super().__init__(a_variable="Child")

What would be the correct way to implement this (one/two/three), if any? Or is it an overkill and it would be best to just use different instances of ParentClass, passing different values to the constructor?

I think I should use the @dataclass decorator also for the child classes, but if I check the type of the child class, that seems to be a dataclass even if I don't use it.

Plus, I feel like overwriting __init__ defeats the purpose of using a dataclass in the first place, but on the other hand the standard dataclass syntax seems useless because it would mean having to rewrite all the variables in the child classes (a_variable: str = "DataclassChild").


Solution

  • I would argue that #1 is the most correct method. For the example you showed, it appears to be irrelevant which method you use, but if you add a second variable, the differences become apparent. This is implicitly confirmed by the Inheritance section in the documentation.

    @dataclass
    class ParentClass:
        a: str
        b: str = "parent-b"
    
    # This works smoothly
    @dataclass
    class ChildClass1(ParentClass):
        a: str = "child-a"
    
    # This works, but is a maintenance nightmare
    @dataclass
    class ChildClass2(ParentClass):
        def __init__(self, a="child-a", b="parent-b"):
            super().__init__(a, b)
    
    # This works, but it changes the signature and only works if a is first
    @dataclass
    class ChildClass3(ParentClass):
        def __init__(self, a="child-a", **kwargs):
            super().__init__(a, **kwargs)
    

    Right now, the dataclass decorator is adding default methods, including __init__ to your class. That means that if you wanted to use option #2 or #3, you would have to know and copy the function signature for all the parameters. At the same time, option #1 allows you to change the default for just a.

    The other way to do what you're doing is to create a __post_init__ method for your child classes, which can then override the parent default value:

    @dataclass
    class ParentClass:
        a: str = ''  # Or pick some other universally acceptable marker
    
    @dataclass
    class ChildClass(ParentClass):
        def __post_init__(self):
            if self.a == '':
                self.a = "child-a"
    

    This is also needlessly complex for most scenarios, but may be useful for a more complex situation. Normally __post_init__ is meant to be used to initialize derived fields, as in the example in the linked documentation.