Search code examples
pythonctypespyyaml

YAML representation of a dictionary from multi-level Ctype Structure gets a strange object


TL;DR: I have a ctype Structure with another Structure inside, which I convert (seemingly correct) to a python dictionary, then I attempt to dump it into YAML. However the value of the internal Structure is shown wrong.

Python version: 3.10 Pyyaml version: 6.0

Background: I am trying to make our internal configuration handling more user-friendly. Our configuration files are the serialized copy of data from a C structure, and until now it was changed manually via hexeditor. My plan is to make this process more human readable, in a short term reading and writing YAML files.

I am using Ctypes Structure to serialize and deserialize the data, mainly because I have to support bitfields. The Ctype Structure could be used as a template, without actual values in it.

I have simplified the code by removing irrelevant functions and shortening the structures.

class substruct_t(My_Structure):
    _pack_ = 1
    _fields_ = [
        ("app", c_uint8, 4),
        ("showScale", c_uint8, 2),
        ("showIdleTemp", c_uint8, 2),
        ("type", c_uint8),
    ]

class ee_struct(My_Structure):
    _pack_ = 1
    _fields_ = [
        ("txStatusDelay", c_uint8, 5),
        ("overrideHours", c_uint8, 3),
        ("manualSet", c_int16),
        ("tempOffset", c_int8),

        ("substruct", substruct_t),

        ("LogArr", (c_uint8*6)*3),
        ("frostTemp", c_int8),
        ("fileVer", c_uint8*4),
    ]

class eeprom_t(Union):
    _fields_ = [("as_struct", ee_struct), ("as_bytes", c_uint8*29)]

    def __str__(self) -> str:
        return str(self.as_struct)
    
    def get_as_dict(self):
        return self.as_struct.as_dict()
    
    def get_as_bytes(self):
        return np.ndarray((29, ), 'b', self.as_bytes, order='C')

I have created a My_Structure class, which inherits from Ctypes Structure, and allows different representation of the data, including dict. This seems to work well.

# Child class of Structure with string and dictionary representation functions, unwrapping arrays
class My_Structure(Structure):
    def __recursive_carray_get(self, value):
        # Necessary recursive function, if value is ctype array
        if hasattr(value, '__len__'):
            rtn = list()
            for i in range(value.__len__()):
                rtn.append(self.__recursive_carray_get(value.__getitem__(i)))
        else:
            rtn = value
        return rtn

    def __handle_array_type__(self, type):
        # example unformatted type: <class '__main__.c_ubyte_Array_6_Array_3'>
        return StringBetween("'", "'", str(type)).split(".")[1]
    
    def __repr__(self) -> str:
        return str(self.as_dict())

    def __str__(self) -> str:
        values = ",\n".join(f"{name}={value['value']}" for name, value in self.as_dict().items())
        return f"<{self.__class__.__name__}: {values}>"
    
    def as_dict(self) -> dict:
        return {field[0]: {'value': self.__recursive_carray_get(getattr(self, field[0])), 'type': self.__handle_array_type__(field[1])}
                for field in self._fields_}

However when I want to dump this dict into a YAML file, the value of substruct within ee_sruct is dumped badly, like a python object. I do not understand why, as a typecheck shows it is still a dict.

 ### Dict representation of substructure:
{'value': {'app': {'value': 2, 'type': 'c_ubyte'}, 'showScale': {'value': 0, 'type': 'c_ubyte'}, 'showIdleTemp': {'value': 1, 'type': 'c_ubyte'}, 'type': {'value': 2, 'type': 'c_ubyte'}}, 'type': 'substruct_t'}

 ### pyyaml dump of entire structure:
txStatusDelay: {value: 8, type: c_ubyte}
overrideHours: {value: 1, type: c_ubyte}
manualSet: {value: 100, type: c_short}
tempOffset: {value: 0, type: c_byte}
substruct:
  value: !!python/object/apply:_ctypes._unpickle
  - !!python/name:__main__.substruct_t ''
  - !!python/tuple
    - {}
    - !!binary |
      QgI=
  type: substruct_t
LogArr:
  value:
  - [1, 0, 0, 0, 0, 0]
  - [2, 0, 0, 0, 0, 0]
  - [3, 0, 0, 0, 0, 0]
  type: c_ubyte_Array_6_Array_3
frostTemp: {value: 16, type: c_byte}
fileVer:
  value: [65, 66, 104, 0]
  type: c_ubyte_Array_4

If I use a hardcoded dict with the exact same contents I get from get_as_dict, everything works.

Apparently, the dump functions don't get the same data as what gets printed from get_as_dict. Why is that, and how can I fix it?

What I tried:

My first idea was to implement a recursive function to return dict for internal structures (similarly what I did for arrays), but I was not sure where to start, as substruct is already reported as dict, and using the string (hardcoded) representation works.

How to export a Pydantic model instance as YAML with URL type as string seemed like a good approach, but combining Structure and YAMLObject resulted in a metaclass conflict, which I was unable to resolve.

I tried to dump into Json or using ruamel.yml, both throw an exception, complaining about substruct_t.

Combining Dumper class with string representer to get exact required YAML output could be the right approach, however it looks quite complicated, and I am hoping, there is a more simple solution that I just overlooked.

I just found a dirty fix, following the steps:

  • convert the dict from get_as_dict() to a string
  • replace all ' characters to "
  • use json.loads() on the string to create a new dict, and use that instead It works, but it just underlines my question, why are the two dicts different to the dumpers?

Solution

  • Listing [Python.Docs]: ctypes - A foreign function library for Python.

    The problem is that __recursive_carray_get, does what it name suggests (handles arrays, which is consistent).

    But, when it comes to (sub) structures, it doesn't handle them (or it handles them as any basic type):

    1. So, when the ee_struct instance is serialized to a dictionary (by calling its as_dict method), the value corresponding to the substruct key is actually a substruct_t instance

    2. Due to the fact that __repr__ is overridden (to also use as_dict), when printing the dictionary, the substruct_t instance is also displayed as a dictionary, masking the previous error

    I fixed the errors in your code, and added some other improvements (with minimum changes).

    code00.py:

    #!/usr/bin/env python
    
    import ctypes as cts
    import sys
    from pprint import pprint as pp
    
    import yaml
    
    
    class SerializableStructure(cts.Structure):
        @classmethod
        def _as_dict(cls, value):
            if isinstance(value, cts.Array):
                ret = [cls._as_dict(e) for e in value]
            elif hasattr(value, "as_dict"):
                ret = value.as_dict()
            else:
                ret = value
            return ret
    
        def __repr__(self) -> str:
            return str(self.as_dict())
    
        def __str__(self) -> str:
            values = ",\n".join(f"{name}={value['value']}" for name, value in self.as_dict().items())
            return f"<{self.__class__.__name__}: {values}>"
    
        def as_dict(self) -> dict:
            return {f[0]: {"value": self._as_dict(getattr(self, f[0])), "type": f[1].__name__}
                    for f in self._fields_}
    
    
    class Substruct_t(SerializableStructure):
        _pack_ = 1
        _fields_ = (
            ("app", cts.c_uint8, 4),
            ("showScale", cts.c_uint8, 2),
            ("showIdleTemp", cts.c_uint8, 2),
            ("type", cts.c_uint8),
        )
    
    
    class EEStruct(SerializableStructure):
        _pack_ = 1
        _fields_ = (
            ("txStatusDelay", cts.c_uint8, 5),
            ("overrideHours", cts.c_uint8, 3),
            ("manualSet", cts.c_int16),
            ("tempOffset", cts.c_int8),
            ("substruct", Substruct_t),
            ("LogArr", (cts.c_uint8 * 6) * 3),
            ("frostTemp", cts.c_int8),
            ("fileVer", cts.c_uint8 * 4),
        )
    
    
    def main(*argv):
        ees = EEStruct()
        marker = "----------"
        print("{:s} Original object {:s}".format(marker, marker))
        print(ees)
        d = ees.as_dict()
        print("\n{:s} Type: {:} {:s}".format(marker, type(d["substruct"]["value"]), marker))  # @TODO - cfati: Check for the old implementation
        print("\n{:s} Dictionary representation {:s}".format(marker, marker))
        pp(d)
        print("\n{:s} YAML representation {:s}".format(marker, marker))
        print(yaml.dump(d))
    
    
    if __name__ == "__main__":
        print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                       64 if sys.maxsize > 0x100000000 else 32, sys.platform))
        rc = main(*sys.argv[1:])
        print("\nDone.\n")
        sys.exit(rc)
    

    Output:

    [cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076458298]> "e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe" ./code00.py
    Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32
    
    ---------- Original object ----------
    <EEStruct: txStatusDelay=0,
    overrideHours=0,
    manualSet=0,
    tempOffset=0,
    substruct={'app': {'value': 0, 'type': 'c_ubyte'}, 'showScale': {'value': 0, 'type': 'c_ubyte'}, 'showIdleTemp': {'value': 0, 'type': 'c_ubyte'}, 'type': {'value': 0, 'type': 'c_ubyte'}},
    LogArr=[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]],
    frostTemp=0,
    fileVer=[0, 0, 0, 0]>
    
    ---------- Type: <class 'dict'> ----------
    
    ---------- Dictionary representation ----------
    {'LogArr': {'type': 'c_ubyte_Array_6_Array_3',
                'value': [[0, 0, 0, 0, 0, 0],
                          [0, 0, 0, 0, 0, 0],
                          [0, 0, 0, 0, 0, 0]]},
     'fileVer': {'type': 'c_ubyte_Array_4', 'value': [0, 0, 0, 0]},
     'frostTemp': {'type': 'c_byte', 'value': 0},
     'manualSet': {'type': 'c_short', 'value': 0},
     'overrideHours': {'type': 'c_ubyte', 'value': 0},
     'substruct': {'type': 'Substruct_t',
                   'value': {'app': {'type': 'c_ubyte', 'value': 0},
                             'showIdleTemp': {'type': 'c_ubyte', 'value': 0},
                             'showScale': {'type': 'c_ubyte', 'value': 0},
                             'type': {'type': 'c_ubyte', 'value': 0}}},
     'tempOffset': {'type': 'c_byte', 'value': 0},
     'txStatusDelay': {'type': 'c_ubyte', 'value': 0}}
    
    ---------- YAML representation ----------
    LogArr:
      type: c_ubyte_Array_6_Array_3
      value:
      - - 0
        - 0
        - 0
        - 0
        - 0
        - 0
      - - 0
        - 0
        - 0
        - 0
        - 0
        - 0
      - - 0
        - 0
        - 0
        - 0
        - 0
        - 0
    fileVer:
      type: c_ubyte_Array_4
      value:
      - 0
      - 0
      - 0
      - 0
    frostTemp:
      type: c_byte
      value: 0
    manualSet:
      type: c_short
      value: 0
    overrideHours:
      type: c_ubyte
      value: 0
    substruct:
      type: Substruct_t
      value:
        app:
          type: c_ubyte
          value: 0
        showIdleTemp:
          type: c_ubyte
          value: 0
        showScale:
          type: c_ubyte
          value: 0
        type:
          type: c_ubyte
          value: 0
    tempOffset:
      type: c_byte
      value: 0
    txStatusDelay:
      type: c_ubyte
      value: 0
    
    
    Done.
    

    For more details on the same (or similar) topic, check: