Search code examples
pythonpython-typingpython-dataclasses

Is it possible to create a recursive dataclass using make_dataclass in python?


Here is a simple example where I am trying to create a recursive Node definition which contains an optional child that is also a Node. The code compiles but when I try to access the type definitions I get node is not defined. Is it possible to get around this error?

import dataclasses
import typing as t

node_type = dataclasses.make_dataclass(
    "node", [("child", t.Optional["node"], dataclasses.field(default=None))]
)
print(t.get_type_hints(node_type))

Outputs

NameError: name 'node' is not defined

I'm using python 3.9.2.


Solution

  • There are three problems here. They're solvable, but they may not be cleanly solvable in the kinds of situations where you would actually use dataclasses.make_dataclass.

    The first problem is that typing.get_type_hints is looking for a class named 'node', but you called the global variable node_type. The name you pass to make_dataclass, the name you use in the annotations, and the name you assign the dataclass to all have to be the same:

    Node = dataclasses.make_dataclass(
        "Node", [("child", t.Optional["Node"], dataclasses.field(default=None))]
    )
    

    But that's still not going to be enough, because typing.get_type_hints isn't looking in the right namespace. That's the second problem.

    When you call typing.get_type_hints on a class, typing.get_type_hints will try to resolve string annotations by looking in the module where the class was defined. It determines that module by looking at the __module__ entry in the class's __dict__. Because you've created your node class in a weird way that doesn't go through the normal class statement, the class's __module__ isn't set up to refer to the right module. Instead, it's set to 'types'.

    You can fix this by manually pre-setting __module__ to the __name__ of the current module:

    Node = dataclasses.make_dataclass(
        "Node",
        [("child", t.Optional["Node"], dataclasses.field(default=None))],
        namespace={'__module__': __name__}
    )
    

    Then typing.get_type_hints will be able to resolve the string annotations.

    The meta-problem is, if you're using dataclasses.make_dataclass in practice, you probably don't know the class name. You're probably using it in a function, and/or inside a loop. typing.get_type_hints has to be able to find the class through a global variable matching the class name, but dynamic variable names are messy.

    You can take the simple approach of just setting a global with globals():

    globals()[your_dataclass.__name__] = your_dataclass
    

    but that's dangerous. If two generated classes have the same name, the second will replace the first. If a generated class has the same name as something else in the global namespace, such as if you did from some_dependency import Thing and then generated a class named Thing, the generated class will stomp the existing global value.

    If you can guarantee those things won't happen, globals() might be fine. If you can't make such guarantees, you might need to do something like generate a new module for each generated class to live in, so they each get their own independent global namespace, or you might just accept and document the fact that get_type_hints won't work for your generated classes.