Search code examples
python-3.xpython-dataclasses

Define a dataclass having an attribute as List of itself


I am working with python3 and am just starting to learn about dataclass

I am trying to create a dataclass having an attribute that is list of itself.

Something like:

@dataclass
class Directory:
    name: str = field(default_factory=generate_randomly)
    num_of_files: int = 0
    ...
    subdirectories: List[Directory] = []

What I am struggling with is how to define the subdirectories attribute which is a List of Directory itself

If I try this

dir1 = Directory('folder1')
dir2 = Directory('folder2')
dir = Directory(subfolders=[dir1, dir2])

Traceback (most recent call last):
  File "main.py", line 14, in <module>
    class Directory:
  File "main.py", line 17, in Directory
    subfolders: List(Directory) = []
NameError: name 'Directory' is not defined

I saw one post here but that doesn't look like what I need


Solution

  • Seems like a good start to me so far, though you have a few minor typos:

    1. Change def to class, since you're creating a class - dataclasses are just regular Python classes.
    2. For forward references - in this case Directory is not yet defined - wrap the type in single or double quotes ' - so it becomes a string, and thus is lazy evaluated.
    3. Use dataclasses.field() with a default_factory argument for mutable types like list, dict, and set.

    Example code putting it all together:

    import random
    import string
    from dataclasses import field, dataclass
    from typing import List
    
    
    def generate_randomly():
        return ''.join(random.choice(string.ascii_letters) for _ in range(15))
    
    
    @dataclass
    class Directory:
        name: str = field(default_factory=generate_randomly)
        num_of_files: int = 0
        subdirectories: List['Directory'] = field(default_factory=list)
    
    
    print(Directory())
    

    In Python 3.7+, you can use a __future__ import so that all annotations are forward-declared (converted to strings) by default. This can simplify logic so you don't need single quotes, or even an import from typing module.

    from __future__ import annotations
    
    from dataclasses import field, dataclass
    
    
    @dataclass
    class Directory:
        name: str = field(default_factory=generate_randomly)
        num_of_files: int = 0
        subdirectories: list[Directory] = field(default_factory=list)
    

    To validate that each element in a subdirectory is actually Directory type, since dataclasses doesn't automatically handle this, you can add logic in __post_init__() to achieve this:

        def __post_init__(self):
            for dir in self.subdirectories:
                if not isinstance(dir, Directory):
                    raise TypeError(f'{dir}: invalid type ({type(dir)}) for subdirectory')