Search code examples
pythonpython-typingpython-dataclasses

How can i type hint the init params are the same as fields in a dataclass?


Let us say I have a custom use case, and I need to dynamically create or define the __init__ method for a dataclass.

For exampel, say I will need to decorate it like @dataclass(init=False) and then modify __init__() method to taking keyword arguments, like **kwargs. However, in the kwargs object, I only check for presence of known dataclass fields, and set these attributes accordingly (example below)

I would like to type hint to my IDE (PyCharm) that the modified __init__ only accepts listed dataclass fields as parameters or keyword arguments. I am unsure if there is a way to approach this, using typing library or otherwise. I know that PY3.11 has dataclass transforms planned, which may or may not do what I am looking for (my gut feeling is no).

Here is a sample code I was playing around with, which is a basic case which illustrates problem I am having:

from dataclasses import dataclass


# get value from input source (can be a file or anything else)
def get_value_from_src(_name: str, tp: type):
    return tp()  # dummy value


@dataclass
class MyClass:
    foo: str
    apple: int

    def __init__(self, **kwargs):
        for name, tp in self.__annotations__.items():
            if name in kwargs:
                value = kwargs[name]
            else:
                # here is where I would normally have the logic
                # to read the value from another input source
                value = get_value_from_src(name, tp)
                if value is None:
                    raise ValueError

            setattr(self, name, value)


c = MyClass(apple=None)
print(c)

c = MyClass(foo='bar',  # here, I would like to auto-complete the name
                        # when I start typing `apple`
            )
print(c)

If we assume that number or names of the fields are not fixed, I am curious if there could be a generic approach which would basically say to type checkers, "the __init__ of this class accepts only (optional) keyword arguments that match up on the fields defined in the dataclass itself".


Addendums, based on notes in comments below:

  • Passing @dataclass(kw_only=True) won't work because imagine I am writing this for a library, and need to support Python 3.7+. Also, kw_only has no effect when a custom __init__() is implemented, as in this case.

  • The above is just a stub __init__ method. it could have more complex logic, such as setting attributes based on a file source for example. basically the above is just a sample implementation of a larger use case.

  • I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can be instantiated without arguments, like MyClass(), don't seem like the best idea to me.

  • It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)

Hope this post clarifies the expectations and desired result. If there are any questions or anything that is a bit vague, please let me know.


Solution

  • What you are describing is impossible in theory and unlikely to be viable in practice.

    TL;DR

    Type checkers don't run your code, they just read it. A dynamic type annotation is a contradiction in terms.

    Theory

    As I am sure you know, the term static type checker is not coincidental. A static type checker is not executing the code your write. It just parses it and infers types according to it's own internal logic by applying certain rules to a graph that it derives from your code.

    This is important because unlike some other languages, Python is dynamically typed, which as you know means that the type of a "thing" (variable) can completely change at any point. In general, there is theoretically no way of knowing the type of all variables in your code, without actually stepping through the entire algorithm, which is to say running the code.

    As a silly but illustrative example, you could decide to put the name of a type into a text file to be read at runtime and then used to annotate some variable in your code. Could you do that with valid Python code and typing? Sure. But I think it is beyond clear, that static type checkers will never know the type of that variable.

    Why your proposition won't work

    Abstracting away all the dataclass stuff and the possible logic inside your __init__ method, what you are asking boils down to the following.

    "I want to define a method (__init__), but the types of its parameters will only be known at runtime."

    Why am I claiming that? I mean, you do annotate the types of the class' attributes, right? So there you have the types!

    Sure, but these have -- in general -- nothing whatsoever to do with the arguments you could pass to the __init__ method, as you yourself point out. You want the __init__ method to accept arbitrary keyword-arguments. Yet you also want a static type checker to infer which types are allowed/expected there.

    To connect the two (attribute types and method parameter types), you could of course write some kind of logic. You could even implement it in a way that enforces adherence to those types. That logic could read the type annotations of the class attributes, match up the **kwargs and raise TypeError if one of them doesn't match up. This is entirely possible and you almost implemented that already in your example code. But this only works at runtime!

    Again, a static type checker has no way to infer that, especially since your desired class is supposed to just be a base class and any descendant can introduce its own attributes/types at any point.

    But dataclasses work, don't they?

    You could argue that this dynamic way of annotating the __init__ method works with dataclasses. So why are they so different? Why are they correctly inferred, but your proposed code can't?

    The answer is, they aren't.

    Even dataclasses don't have any magical way of telling a static type checker which parameter types the __init__ method is to expect, even though they do annotate them, when they dynamically construct the method in _init_fn.

    The only reason mypy correctly infers those types, is because they implemented a separate plugin just for dataclasses. Meaning it works because they read through PEP 557 and hand-crafted a plugin for mypy that specifically facilitates type inference based on the rules described there.

    You can see the magic happening in the DataclassTransformer.transform method. You cannot generalize this behavior to arbitrary code, which is why they had to write a whole plugin just for this.

    I am not familiar enough with how PyCharm does its type checking, but I strongly suspect they used something similar.

    So you could argue that dataclasses are "cheating" with regards to static type checking. Though I am certainly not complaining.

    Pragmatic solution

    Even something as "high-profile" as Pydantic, which I personally love and use extensively, requires its own mypy plugin to realize the __init__ type inference properly (see here). For PyCharm they have their own separate Pydantic plugin, without which the internal type checker cannot provide those nice auto-suggestions for initialization etc.

    That approach would be your best bet, if you really want to take this further. Just be aware that this will be (in the best sense of the word) a hack to allow specifc type checkers to catch "errors" that they otherwise would have no way of catching.

    The reason I argue that it is unlikely to be viable is because it will essentially blow up the amount of work for your project to also cover the specific hacks for those type checkers that you want to satisfy. If you are committed enough and have the resources, go for it.

    Conclusion

    I am not trying to discourage you. But it is important to know the limitations enforced by the environment. It's either dynamic types and hacky imperfect type checking (still love mypy), or static types and no "kwargs can be anything" behavior.

    Hope this makes sense. Please let me know, if I made any errors. This is just based on my understanding of typing in Python.