"Templated function" like in c++, but in python?

So I am solving a lot of Advent of Code tasks these days, and I set myself the added challenge of writing the code following as many best practices as possible. In particular, this means using typing, making the code as DRY as possible, and separating the data structures from the logical structures. However, I am running into a little bit of a problem.

Essentially, let me lay out the parts of the code that certainly need to be written and written only once. These are

A set of 25 data types, let's call them Data_i where i is an integer between 1 and 25.
For each i, a method for parsing Data_i from a file. Let's assume, for the sake of argument, that this method is load_data_i.
A set of functions, let's say they are function_i_j where i is an integer between 1 and 25 and i is either 1 or 2. All functions return a string, and for each i, the function function_i_j accepts an instance of type Data_i.

Very basically, the code I could then write to handle a particular problem would be something like this:

def solve(problem_number, task_number):
    g = globals()
    g[f'function{problem_number}_{task_number}'](g[f'load_data_{problem_number}']())

however this, while quite DRY, is all sorts of hacky and ugly, and not really conducive to type hinting.

Some other ideas I had were:

A Solver class with abstract methods function_1 and function_2, and a method solve that just calls one of the two abstract methods. Then have 25 classes that inherit from Solver. The problem here is that each class inheriting from Solver will accept a different data type.
A Solver class that also has data part of each solver, but that violates separating data from logic.
Somehow using protocols, but I always hit one of the two problems above.

I feel more at home in c++, where the above problem could be solved by making function_i_j a templated class, and then explicitly instantiating it for the 25 data types.

Now, my two questions:

Can something similar be done in python, and if so, how?
If not, what other method, as mypy conforming as possible, will make my code "prettiest"?

Minimum example with only two data types:

Data1 = str
Data2 = float

def load_data_1(file_path: Path):
    with open(file_path) as f:
        return f.readlines()

def load_data_2(file_path: Path):
    with open(file_path) as f:
        return float(f.readline())

def function_1_1(data: Data01) -> str:
    return data.strip()

def function_1_2(data: Data01) -> str:
    return data.upper()

def function_2_1(data: Data02) -> str:
    return f'{data < 0}'

def function 2_2(data: Data02) -> str:
    return f'{data > 3.16}'

def main(problem_number: int, version_number: int) -> None:
    g = globals()
    function_to_call = g[f'function{problem_number}_{task_number}']
    data_loader = g[f'load_data_{problem_number}']
    data_path = f'/path/to/data_{problem_number}.txt'
    print(function_to_call(data_loader(data_path)))

Solution

TL;DR

There is no equivalent to what you describe as a templated function in Python.

Getting an object by name dynamically (i.e. at runtime) will always make it impossible to infer its type for a static type checker. A type checker does not execute your code, it just reads it.

There are a few patterns and workarounds available to achieve code that more or less satisfies your constraints.

The problem (?)

Here is how I understand the problem.

We are given N distinct data schemas (with N ≈ 25). Each of those schemas should be represented by its own data class. These will be our data types.

There should be a distinct function for each of our data classes that loads a file and parses its contents into an instance of that data class. We'll refer to them as our load functions. The logic for each of those load functions is given; they should all accept a file path and return an instance of their corresponding data class. There will consequently be N load functions.

For each data type, we are given M distinct algorithms (with M ≈ 2). Each of these algorithms shall have its own function that takes an instance of its corresponding class and returns a string. We'll call them our solver functions. Thus, there will be a total of N × M solver functions.

Each data type will be encoded with an integer i between 1 and N, which we will call the problem number. Each solver function for a given data type (i.e. for a given problem number) will be encoded with an integer j between 1 and M, which we will call the version number.

We are given N different files of data, each corresponding to a different data type. All the files reside in the same directory and are named data_i.txt, where i stands for its corresponding problem number.

The input to our main program will be two integers i and j.

The task is to load the i-th data file from disk, parse it into the corresponding data class' instance via its matching load function, call the j-th solver function defined for that data type on that instance, and print its output.

Added constraints

The code should be fully annotated and as type safe as possible.
There should be clear separation between data (definition, parsing, loading) and logic (solver algorithms).
Code repetition should be minimal.
PEP 20, PEP 8, and other pythonic best practices should be honored.

Where any of these constraints stand in conflict with one another, we should strive for a reasonable balance between them.

Suggested solution

Code layout

Three files in one package (+ __init__.py):

data.py containing data class definitions (and related code)
solver.py containing the solver functions (and related code)
main.py with the main function/script

I may reduce the number of blank lines/line breaks below what is typically suggested in style guides in the following to improve readability (reduce scrolling) on this site.

The `data` module

Considerations

Everything, literally everything (aside from keywords like if or def) in Python is an object and thus an instance of a class. Without further information we can assume that data of a certain schema can be encapsulated by an instance of a class. Python's standard library for example provides the dataclasses module that may be useful in such situations. Very good third-party libraries exist, too.

To utilize the benefits that object-oriented programming provides, honor the DRY principle, and to improve code-reuse, and type clarity among other things, we can define one base data class that all our N data classes will inherit from.

Since the load function has an intimate 1:1 relationship with our data type, it is entirely reasonable to make it a method of our data classes. Since the logic is different for each individual data class, but each of them will have one, this is the perfect use case for abstract base classes (ABC) and the abstractmethod decorator provided by the abc module. We can define our base class as abstract and force any subclass to implement a load method, aside from its own data fields of course.

Code

data.py

from __future__ import annotations
from abc import ABC, abstractmethod
from dataclasses import dataclass
from pathlib import Path
from typing import TypeVar

__all__ = [
    "AbstractData",
    "Data1",
    "Data2",
    # ...
]

D = TypeVar("D", bound="AbstractData")

class AbstractData(ABC):
    @classmethod
    @abstractmethod
    def load(cls: type[D], file_path: Path) -> D: ...

@dataclass
class Data1(AbstractData):
    x: str

    @classmethod
    def load(cls, file_path: Path) -> Data1:
        with file_path.open("r") as f:
            return Data1(x=f.readline())

@dataclass
class Data2(AbstractData):
    y: float

    @classmethod
    def load(cls, file_path: Path) -> Data2:
        with file_path.open("r") as f:
            return Data2(y=float(f.readline()))

...

Details

To be able to express that the type of the Data1.load class method is as subtype of AbstractData.load, we annotate the latter with a type variable in such a way that a type checker expects the output of that method to be of the specific type that it binds to (i.e. cls). That type variable further receives an upper bound of AbstractData to indicate that not any type object is valid in this context, but only subtypes of AbstractData.

The `solver` module

Considerations

Introduce a base solver class and a subclass for each problem number. Regarding abstractness and inheritance, the same ideas apply.

The difference this time is that we can make the base solver class generic in terms of the data class it deals with. This allows us (with a few tricks) to minimize code, while maintaining type safety.

A solver will have an attribute that can hold a reference to an instance of its corresponding data class. When initializing a solver, we can provide the path to a data file to immediately load and parse the data and save an instance of its data class in that attribute of the solver. (And/Or we can load it later.)

We will write a get_solver function that takes the problem number as its argument and returns the corresponding solver class. It will still use the approach of fetching it from the globals() dictionary, but we will make this as type safe, runtime safe, and clean as possible (given the situation).

To have knowledge of the narrowest possible type, i.e. the concrete solver subclass returned by get_solver, we will have no choice but to use the Literal+overload pattern. And yes, that means N distinct signatures for the same function. (Notice the trade-off "DRY vs. type safe" here.)

Code

solver.py

from abc import ABC, abstractmethod
from pathlib import Path
from typing import Generic, Literal, TypeAlias, TypeVar
from typing import get_args, get_origin, overload

from .data import *

__all__ = [
    "AbstractBaseSolver",
    "Solver1",
    "Solver2",
    "ProblemNumT",
    "VersionNumT",
    "get_solver",
]

D = TypeVar("D", bound=AbstractData)

class AbstractBaseSolver(ABC, Generic[D]):
    _data_type: type[D] | None = None  # narrowed in specified subclasses
    _data: D | None = None             # narrowed via instance property

    @classmethod
    def __init_subclass__(cls, **kwargs: object) -> None:
        """
        Initializes a subclass and narrows the `_data_type` attribute on it.

        It does this by identifying this specified class among all original
        base classes and extracting the provided type argument.

        Details: https://stackoverflow.com/questions/73746553/
        """
        super().__init_subclass__(**kwargs)
        for base in cls.__orig_bases__:  # type: ignore[attr-defined]
            origin = get_origin(base)
            if origin is None or not issubclass(origin, AbstractBaseSolver):
                continue
            type_arg = get_args(base)[0]
            # Do not set the attribute for GENERIC subclasses!
            if not isinstance(type_arg, TypeVar):
                cls._data_type = type_arg
                return

    @classmethod
    def get_data_type(cls) -> type[D]:
        if cls._data_type is None:
            raise AttributeError(
                f"{cls.__name__} is generic; type argument unspecified"
            )
        return cls._data_type

    def __init__(self, data_file_path: Path | None = None) -> None:
        if data_file_path is not None:
            self.load_data(data_file_path)

    def load_data(self, file_path: Path) -> None:
        self._data = self.get_data_type().load(file_path)

    @property
    def data(self) -> D:
        if self._data is None:
            raise AttributeError("No data loaded yet")
        return self._data

    @abstractmethod
    def function_1(self) -> str:
        ...

    @abstractmethod
    def function_2(self) -> str:
        ...

class Solver1(AbstractBaseSolver[Data1]):
    def function_1(self) -> str:
        return self.data.x.strip()

    def function_2(self) -> str:
        return self.data.x.upper()

class Solver2(AbstractBaseSolver[Data2]):
    def function_1(self) -> str:
        return str(self.data.y ** 2)

    def function_2(self) -> str:
        return self.data.y.hex()


ProblemNumT: TypeAlias = Literal[1, 2]
VersionNumT: TypeAlias = Literal[1, 2]


@overload
def get_solver(problem_number: Literal[1]) -> type[Solver1]:
    ...

@overload
def get_solver(problem_number: Literal[2]) -> type[Solver2]:
    ...

def get_solver(problem_number: ProblemNumT) -> type[AbstractBaseSolver[D]]:
    cls_name = f"Solver{problem_number}"
    try:
        cls = globals()[cls_name]
    except KeyError:
        raise NameError(f"`{cls_name}` class not found") from None
    assert isinstance(cls, type) and issubclass(cls, AbstractBaseSolver)
    return cls

Details

That whole __init_subclass__ / get_data_type hack is something I explain in more detail here. It allows utilizing the (specific) type argument passed to __class_getitem__, when we subclass AbstractBaseSolver, at runtime. This allows us to only write the code for instantiating, loading and accessing the data class instance once, but still remain entirely type safe with it across all subclasses. The idea is to only write the function_1/function_2 methods on each subclass after specifying the type argument and nothing else.

The code inside the function_-methods is obviously just for demo purposes, but it again illustrates type safety across the board quite nicely.

To be perfectly clear, the ProblemNumT type alias will need to be expanded to the number of problems/data types, i.e. Literal[1, 2, 3, 4, 5, ...]. The call signature for get_solver will likewise need to be written out N times. If anyone has a better idea than repeating the overloaded signature 25 times, I am eager to hear it, as long as the annotations remain type safe.

The actual implementation of get_solver is cautious with the dictionary lookup and transforms the error a bit to keep it in line with the typical Python behavior, when a name is not found. The last assert is for the benefit of the static type checker, to convince it that what we are returning is as advertised, but it is likewise an assurance for us at runtime that we did not mess up along the way.

The `main` module

Not much to say here. Assuming two function versions for each solver/data type, the if-statements are totally fine. If that number increases, well ... you get the idea. What is nice is that we know exactly which solver we get, depending on the integer we pass to get_solver. All the rest is also safe and pretty much self-explanatory:

main.py

from pathlib import Path

from .solver import ProblemNumT, VersionNumT, get_solver


DATA_DIR_PATH = Path(__file__).parent


def main(problem_number: ProblemNumT, version_number: VersionNumT) -> None:
    solver_cls = get_solver(problem_number)
    data_file_path = Path(DATA_DIR_PATH, f"data_{problem_number}.txt")
    solver = solver_cls(data_file_path)
    if version_number == 1:
        print(solver.function_1())
    elif version_number == 2:
        print(solver.function_2())
    else:
        raise ValueError("Version number must be 1 or 2")


if __name__ == "__main__":
    main(1, 2)
    main(2, 1)

If we put a data_1.txt with foo in its first line and a data_2.txt with 2.0 in its first line into the package's directory and run the script with python -m package_name.main, the output will be as expected:

FOO
4.0

There are no complaints from mypy --strict about that package.

Closing thoughts

This is the best I could come up with after the little back and forth in the comments. If this illustrates a grave misunderstanding, feel free to point it out. It still seems to me that your question is very broad and allows a lot of room for interpretation, which makes pedants like me uncomfortable. I don't consider myself an expert, but I hope this still illustrates a few patterns and tricks that Python offers, when trying to write clean code.

"Templated function" like in c++, but in python?

TL;DR

The problem (?)

Added constraints

Suggested solution

Code layout

The data module

Considerations

Code

Details

The solver module

Considerations

Code

Details

The main module

Closing thoughts

The `data` module

The `solver` module

The `main` module