Search code examples
pythonxmlpydantic

Using pydantic with xml


I am working on a project that uses a lot of xml, and would like to use pydantic to model the objects. In this case I simplified the xml but included an example object.

<ns:SomeType name="NameType" shortDescription="some data">
  <ns:Bar
    thingOne="alpha"
    thingTwo="beta"
    thingThree="foobar"/>
</ns:SomeType>

Code

from pydantic import BaseModel
from typing import Optional, List
from xml.etree import ElementTree as ET


class Bar(BaseModel):
  thing_one: str
  thing_two: str
  thing_three: str


class SomeType(BaseModel):
  name: str
  short_description: str
  bar: Optional[Bar]


def main():
  with open("path/to/file.xml") as fp:
    source = fp.read()
  root = ET.fromstring(source)
  some_type_list = []
  for child in root:
    st = SomeType(
      name=child.attrib["name"],
      short_description=child.attrib["shortDescription"],
    )
    for sub in child:
      st.bar = Bar(
        thing_one=sub.attrib["thingOne"],
        thing_two=sub.attrib["thingTwo"],
        thing_three=sub.attrib["thingThree"],
      )

I looked into BaseModel.parse_obj or BaseModel.parse_raw but I don't think that will solve the problem. I also thought I could try to use xmltodict to convert the xml, the namespace's and the @ attribute's get even more in the way...

>>> import xmltodict
>>> xmltodict.parse(input_xml)
{'ns:SomeType': {'@name': 'NameType', '@shortDescription': 'some data', ... }}

Solution

  • xmltodict can help in your example if you combine it with field aliases:

    from typing import Optional
    
    import xmltodict
    from pydantic import BaseModel, Field
    
    
    class Bar(BaseModel):
        thing_one: str = Field(alias="@thingOne")
        thing_two: str = Field(alias="@thingTwo")
        thing_three: str = Field(alias="@thingThree")
    
    
    class SomeType(BaseModel):
        name: str = Field(alias="@name")
        short_description: str = Field(alias="@shortDescription")
        bar: Optional[Bar] = Field(alias="ns:Bar")
    
    
    class Root(BaseModel):
        some_type: SomeType = Field(alias="ns:SomeType")
    
    
    print(
        Root.model_validate(
            xmltodict.parse(
                """<ns:SomeType name="NameType" shortDescription="some data">
      <ns:Bar
        thingOne="alpha"
        thingTwo="beta"
        thingThree="foobar"/>
    </ns:SomeType>""")).some_type)
    

    Output:

    name='NameType' short_description='some data' bar=Bar(thing_one='alpha', thing_two='beta', thing_three='foobar')
    

    You can see in the example above that a Root model is needed because the dict has an ns:SomeType key.