Search code examples
python-3.xxml-parsingdata-wranglingxmltocsv

XML Parsing with Decode in Python


The XML File I'm trying to read starts with b':

    b'<?xml version="1.0" encoding="UTF-8" ?><root><property_id type="dict"><n53987 type="int">54522</n53987><n65731 type="int">66266</n65731><n44322 type="int">44857</n44322><n11633 type="int">12148</n11633><n28192 type="int">28727</n28192><n69053 type="int">69588</n69053><n26529 type="int">27064</n26529><n4844 type="int">4865</n4844><n7625 type="int">7646</n7625><n54697 type="int">55232</n54697><n6210 type="int">6231</n6210><n26710 type="int">27245</n26710><n57915 type="int">58450</n57915

import xml.etree.ElementTree as etree    
tree = etree.decode("UTF-8").parse("./property.xml")  

How can I decode this file? And read the dict type afterwards?


Solution

  • so you can try this, but this returns an Element Instance

    import ast
    import xml.etree.ElementTree as etree
    
    
    tree = None 
    
    with open("property.xml", "r") as xml_file:
         f = xml_file.read()
         
         # convert string representation of bytes back to bytes
         raw_xml_bytes= ast.literal_eval(f)
         
         # read XML from raw bytes
         tree = etree.fromstring(raw_xml_bytes)
    

    Another way is to read the file and convert it fully to a string file and then reread it again, this returns an ElementTree instance. You can achieve this using the following:

    tree = None
    
    with open("property.xml", "r") as xml_file:
        f = xml_file.read()
         
        # convert string representation of bytes back to bytes
        raw_xml_bytes= ast.literal_eval(f)
    
    # save the converted string version of the XML file
    with open('output.xml', 'w') as file_obj:
        file_obj.write(raw_xml_bytes.decode())
    
    # read saved XML file 
    with open('output.xml', 'r') as xml_file:
        tree = etree.parse(f)