Unable to call variable defined in dataclass

I have a data class as follows:

from dataclasses import dataclass, field
from typing import Any, Dict

raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir

@dataclass
class Files:
    raw_path: Path = Path(raw_dir)
    processed_path: Path = Path(processed_dir)

    path_dict: Dict[str, Any] = field(
        default_factory=lambda: {
            "raw_train_file": Path(raw_path, "raw_train.csv"),
            "processed_train_file": Path(processed_path, "processed_train.csv"),
        }
    )
Files().path_dict

This will throw an error name "raw_path" is not defined. But when you try to print raw_path right after the first line, it can done and hence the problem may be from the path_dict. I tried replacing the key-value pair to "raw": Path(directory) and it worked so I do not think it is the issue with the data type.

Context: I treat the dataclass as a config file (func) such that when I need to call a default path, I can just use:

pd.read_csv(Files().path_dict["raw_train_file"])

Solution

Your problem is that the default_factory has to be a zero-argument callable. Because of that, it cannot use any member variable. Here, as the member variables have trivial initialization, you can repeat that initialization, to only use global vars:

...
path_dict: Dict[str, Any] = field(
    default_factory=lambda: {
        "raw_train_file": Path(Path(raw_dir), "raw_train.csv"),
        "processed_train_file": Path(Path(processed_dir), "processed_train.csv"),
    }

But you can also use the special __post_init__ method which is called by the generated __init__ after the other initialization. As it receive the self argument, it can use member variables:

@dataclass
class Files:
    raw_path: Path = Path(raw_dir)
    processed_path: Path = Path(processed_dir)

    def __post_init__(self):
        self.path_dict: Dict[str, Any] = {
            "raw_train_file": Path(self.raw_path, "raw_train.csv"),
            "processed_train_file": Path(self.processed_path, "processed_train.csv"),
        }