Search code examples
pythonxmlpython-3.xminidom

reading XML with different encoding using python mindom


I wrote a script reading XML files using minidom:

from xml.dom.minidom import parse
for File in Data['FileList']:
    Xml = parse(File)
#do something

which runs fine, but some guys are creating XMLs defining UTF-8 encoding in the XML and using German Umlaute in tags so I ran into xml.parsers.expat.ExpatError: not well-formed (invalid token).

If I change manually in the XML to encoding="ISO-8859-1" it runs fine.

Is there a more elegant way of changing the encoding, instead of editing the XML files, e.g. telling minidom to use a different encoding than defined in the XML?


Solution

  • I suggest you this solution:

    Before parsing the file, open it normally and replace the first line of it which corresponds to the XML header with this line:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    

    You then save the file and passe it to minidom.parse() function.

    This may help you to replace the first line line in each file: Search and replace a line in a file in Python