Search code examples
pythonpython-dataclasses

Python how to access dataclass properties in list of dataclasses


Using python 3.10.4

Hi all, I'm putting together a script where I'm reading a yaml file with k8s cluster info, and I'd like to treat the loaded yaml as dataclasses so I can reference them with . properties.

Example yaml:

account: 12345
clusters:
  - name: cluster_1
    endpoint: https://cluster_2
    certificate: abcdef
  - name: cluster_1
    endpoint: https://cluster_2
    certificate: abcdef

And here's my script for loading and accessing it:

import yaml
from dataclasses import dataclass

@dataclass
class ClusterInfo:
    _name: str
    _endpoint: str
    _certificate: str

@dataclass
class AWSInfo:
    _account: int
    _clusters: list[ClusterInfo]


clusters = yaml.safe_load(open('D:\git\clusters.yml', 'r'))
a = AWSInfo(
  _account=clusters['account'],
  _clusters=clusters['clusters']
)
print(a._account) #prints 12345
print(a._clusters) #prints the dict of both clusters
print(a._clusters[0]) #prints the dict of the first cluster

#These prints fails with AttributeError: 'dict' object has no attribute '_endpoint'
print(a._clusters[0]._endpoint)
for c in a._clusters:
    print(c._endpoint)

So my question is: What am I doing wrong on the last prints? How can I access the properties of each member in a dataclass array of dataclass objects?


Solution

  • The dataclasses module doesn't provide built-in support for this use case, i.e. loading YAML data to a nested class model.

    In such a scenario, I would turn to a ser/de library such as dataclass-wizard, which provides OOTB support for (de)serializing YAML data, via the PyYAML library.

    Disclaimer: I am the creator and maintener of this library.

    Step 1: Generate a Dataclass Model

    Note: I will likely need to make this step easier for generating a dataclass model for YAML data. Perhaps worth creating an issue to look into as time allows. Ideally, usage is from the CLI, however since we have YAML data, it is tricky, because the utility tool expects JSON.

    So easiest to do this in Python itself, for now:

    from json import dumps
    
    # pip install PyYAML dataclass-wizard
    from yaml import safe_load
    from dataclass_wizard.wizard_cli import PyCodeGenerator
    
    yaml_string = """
    account: 12345
    clusters:
      - name: cluster_1
        endpoint: https://cluster_2
        certificate: abcdef
      - name: cluster_1
        endpoint: https://cluster_2
        certificate: abcdef
    """
    
    py_code = PyCodeGenerator(experimental=True, file_contents=dumps(safe_load(yaml_string))).py_code
    print(py_code)
    

    Prints:

    from __future__ import annotations
    
    from dataclasses import dataclass
    
    from dataclass_wizard import JSONWizard
    
    
    @dataclass
    class Data(JSONWizard):
        """
        Data dataclass
    
        """
        account: int
        clusters: list[Cluster]
    
    
    @dataclass
    class Cluster:
        """
        Cluster dataclass
    
        """
        name: str
        endpoint: str
        certificate: str
    

    Step 2: Use Generated Dataclass Model, alongside YAMLWizard

    Contents of my_file.yml:

    account: 12345
    clusters:
      - name: cluster_1
        endpoint: https://cluster_5
        certificate: abcdef
      - name: cluster_2
        endpoint: https://cluster_7
        certificate: xyz
    

    Python code:

    from __future__ import annotations
    
    from dataclasses import dataclass
    from pprint import pprint
    
    from dataclass_wizard import YAMLWizard
    
    
    @dataclass
    class Data(YAMLWizard):
        account: int
        clusters: list[Cluster]
    
    
    @dataclass
    class Cluster:
        name: str
        endpoint: str
        certificate: str
    
    
    data = Data.from_yaml_file('./my_file.yml')
    pprint(data)
    for c in data.clusters:
        print(c.endpoint)
    

    Result:

    Data(account=12345,
         clusters=[Cluster(name='cluster_1',
                           endpoint='https://cluster_5',
                           certificate='abcdef'),
                   Cluster(name='cluster_2',
                           endpoint='https://cluster_7',
                           certificate='xyz')])
    https://cluster_5
    https://cluster_7