Search code examples
pythonpython-typingpython-dataclasses

How to type hint a dynamically-created dataclass


I hate writing things twice, so I came up with a decent way to not have to write things twice. However, this seems to break my type-hinting:

from enum import Enum
from dataclasses import make_dataclass, field, dataclass

class DatasetNames(Enum):
    test1 = "test1_string"
    test2 = "test2_string"
    test3 = "test3_string"

def get_path(s: str) -> str:
    return s + "_path"

# the normal way to do this, but I have to type every new dataset name twice
# and there's a lot of duplicate code
@dataclass(frozen=True)
class StaticDatasetPaths:
    test1 = get_path("test1_string")
    test2 = get_path("test2_string")
    test3 = get_path("test3_string")

# mypy recognizes that `StaticDatasetPaths` is a class
# mypy recognizes that `StaticDatasetPaths.test2` is a string
print(StaticDatasetPaths.test2) # 'test2_string_path'

# this is my way of doing it, without having to type every new dataset name twice and no duplicate code
DynamicDatasetPaths = make_dataclass(
    'DynamicDatasetPaths', 
    [
        (
            name.name,
            str,
            field(default=get_path(name.value))
        )
        for name in DatasetNames
    ],
    frozen=True
)

# mypy thinks `DynamicDatasetPaths` is a `variable` of type `type`
# mypy thinks that `DynamicDatasetPaths.test2` is an `function` of type `Unknown`
print(DynamicDatasetPaths.test2) # 'test2_string_path'

How can I let mypy know that DynamicDatasetPaths is a frozen dataclass whose attributes are strings?

Normally when I run into cases like this, I'm able to just use a cast and tell mypy what the right type is, but I don't know the correct type for "frozen dataclass whose attributes are strings".

(Also, if there's a better way in general to not have the duplicate code, I'd be happy to hear about that as well.)


Solution

  • A data class is meant to create instances. Since you are not instantiating the data class but instead accessing test1, test2, etc. as class attributes, you don't really need a data class at all, but can simply make path a property of the Enum class instead. And since all the members of your Enum class have string values, you can make it a StrEnum class instead for easier string operations:

    from enum import StrEnum
    
    class DatasetNames(StrEnum):
        test1 = "test1_string"
        test2 = "test2_string"
        test3 = "test3_string"
    
        @property
        def path(self) -> str:
            return self + '_path'
    
    print(DatasetNames.test2.path) # outputs test2_string_path
    

    If the get_path function is expensive in your actual use case, consider making path a cached_property instead.