Search code examples
pythonxmlencodingelementtreexml-declaration

Remove "encoding" attribute from XML in Python


I am using python to do some conditional changes to an XML document. The incoming document has <?xml version="1.0" ?> at the top.

I'm using xml.etree.ElementTree.

How I'm parsing the changed XMl:

filter_update_body = ET.tostring(root, encoding="utf8", method="xml")

The output has this at the top:

<?xml version='1.0' encoding='utf8'?>

The client wants the "encoding" tag removed but if I remove it then it either doesn't include the line at all or it puts in encoding= 'us-ascii'

Can this be done so the output matches: <?xml version="1.0" ?>?

(I don't know why it matters honestly but that's what I was told needed to happen)


Solution

  • As pointed out in this answer there is no way to make ElementTree omit the encoding attribute. However, as @James suggested in a comment, it can be stripped from the resulting output like this:

    filter_update_body = ET.tostring(root, encoding="utf8", method="xml")
    filter_update_body = filter_update_body.replace(b"encoding='utf8'", b"", 1)
    

    The b prefixes are required because ET.tostring() will return a bytes object if encoding != "unicode". In turn, we need to call bytes.replace().

    With encoding = "unicode" (note that this is the literal string "unicode"), it will return a regular str. In this case, the bs can be omitted. We use good old str.replace().

    It's worth noting that the choice between bytes and str also affects how the XML will eventually be written to a file. A bytes object should be written in binary mode, a str in text mode.