Search code examples
pythonpython-typing

How do I formalize a repeated relationship among disjoint groups of classes in python?


I have Python code that has the following shape to it:

from dataclasses import dataclass


@dataclass
class Foo_Data:
    foo: int


class Foo_Processor:
    def process(self, data: Foo_Data): ...


class Foo_Loader:
    def load(self, file_path: str) -> Foo_Data: ...


@dataclass
class Bar_Data:
    bar: str


class Bar_Processor:
    def process(self, data: Bar_Data): ...


class Bar_Loader:
    def load(self, file_path: str) -> Bar_Data: ...

I have several instances of this sort of Data/Processor/Loader setup, and the classes all have the same method signatures modulo the specific class family (Foo, Bar, etc.). Is there a pythonic way of formalizing this relationship among classes to enforce a similar structure if I decide to create a Spam_Data, Spam_Processor, and Spam_Loader family of classes? For instance, I want something to enforce that Spam_Processor have a process method which takes an argument of type Spam_Data. Is there a way of achieving this standardization somehow with abstract classes, generic types, or some other structure?

I tried using abstract classes, but mypy correctly points out that having all *_Data classes be subclasses of an abstract Data class and similarly having all *_Processor classes be subclasses of an abstract Processor class violates the Liskov substitution principle, since each processor is only designed for its respective Data class (i.e., Foo_Processor can't process Bar_Data, but one would expect that it could if these classes have superclasses Processor and Data which are compatible in this way).


Solution

  • You can use abstract base classes (ABCs) with Generics. This way you can define a common interface while ensuring type safety:

    from abc import ABC, abstractmethod
    from dataclasses import dataclass
    from typing import Generic, TypeVar
    
    # generic type variable for Data
    T = TypeVar('T', bound='BaseData')
    
    
    @dataclass
    class BaseData(ABC):
        pass
    
    
    class BaseProcessor(ABC, Generic[T]):
        @abstractmethod
        def process(self, data: T) -> None:
            pass
    
    
    class BaseLoader(ABC, Generic[T]):
        @abstractmethod
        def load(self, file_path: str) -> T:
            pass
    

    Now you can define your specific classes

    @dataclass
    class Foo_Data(BaseData):
        foo: int
    
    
    class Foo_Processor(BaseProcessor[Foo_Data]):
        def process(self, data: Foo_Data) -> None: ...
    
    
    class Foo_Loader(BaseLoader[Foo_Data]):
        def load(self, file_path: str) -> Foo_Data: ...
    
    
    @dataclass
    class Bar_Data(BaseData):
        bar: str
    
    
    class Bar_Processor(BaseProcessor[Bar_Data]):
        def process(self, data: Bar_Data) -> None: ...
    
    
    class Bar_Loader(BaseLoader[Bar_Data]):
        def load(self, file_path: str) -> Bar_Data: ...
    

    Writing your code this way combines the benefits of a common interface with type safety.

    • ABCs ensure that subclasses implement required methods, promoting a consistent structure.

    • Generics allow for type-specific operations, enhancing code readability and maintainability.

    As a confirmation with mypy:

    mypy script.py
    Success: no issues found in 1 source file