Search code examples
pythonpython-dataclasses

Unable to call variable defined in dataclass


I have a data class as follows:

from dataclasses import dataclass, field
from typing import Any, Dict

raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir

@dataclass
class Files:
    raw_path: Path = Path(raw_dir)
    processed_path: Path = Path(processed_dir)

    path_dict: Dict[str, Any] = field(
        default_factory=lambda: {
            "raw_train_file": Path(raw_path, "raw_train.csv"),
            "processed_train_file": Path(processed_path, "processed_train.csv"),
        }
    )
Files().path_dict

This will throw an error name "raw_path" is not defined. But when you try to print raw_path right after the first line, it can done and hence the problem may be from the path_dict. I tried replacing the key-value pair to "raw": Path(directory) and it worked so I do not think it is the issue with the data type.


Context: I treat the dataclass as a config file (func) such that when I need to call a default path, I can just use:

pd.read_csv(Files().path_dict["raw_train_file"])

Solution

  • Your problem is that the default_factory has to be a zero-argument callable. Because of that, it cannot use any member variable. Here, as the member variables have trivial initialization, you can repeat that initialization, to only use global vars:

    ...
    path_dict: Dict[str, Any] = field(
        default_factory=lambda: {
            "raw_train_file": Path(Path(raw_dir), "raw_train.csv"),
            "processed_train_file": Path(Path(processed_dir), "processed_train.csv"),
        }
    

    But you can also use the special __post_init__ method which is called by the generated __init__ after the other initialization. As it receive the self argument, it can use member variables:

    @dataclass
    class Files:
        raw_path: Path = Path(raw_dir)
        processed_path: Path = Path(processed_dir)
    
        def __post_init__(self):
            self.path_dict: Dict[str, Any] = {
                "raw_train_file": Path(self.raw_path, "raw_train.csv"),
                "processed_train_file": Path(self.processed_path, "processed_train.csv"),
            }