I'm having an ISM file (InstallShield project) that is formatted as XML.
I need to change some attributes in the file, so I used xml.etree.ElementTree (Python Library).
I can find the values and change them, however, after saving the file with updated values, I can't open it in InstallShield (I get a general error that file cant be open).
When I compare the old file with the new one, I see that beside the values I changed, some lines are simply missing from new XML and in some line the tags name had changed.
Why does it happen? Is there anything to do to make the file stay exactly as it was except for the changes I've made? Should I use other tool to make the change?
For example, the following section appears in original XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="is.xsl" ?>
<!DOCTYPE msi [
<!ELEMENT msi (summary,table*)>
<!ATTLIST msi version CDATA #REQUIRED>
<!ATTLIST msi xmlns:dt CDATA #IMPLIED
codepage CDATA #IMPLIED
compression (MSZIP|LZX|none) "LZX">
<!ELEMENT summary (codepage?,title?,subject?,author?,keywords?,comments?,
template,lastauthor?,revnumber,lastprinted?,
createdtm?,lastsavedtm?,pagecount,wordcount,
charcount?,appname?,security?)>
<!ELEMENT codepage (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT keywords (#PCDATA)>
<!ELEMENT comments (#PCDATA)>
<!ELEMENT template (#PCDATA)>
<!ELEMENT lastauthor (#PCDATA)>
<!ELEMENT revnumber (#PCDATA)>
<!ELEMENT lastprinted (#PCDATA)>
<!ELEMENT createdtm (#PCDATA)>
<!ELEMENT lastsavedtm (#PCDATA)>
<!ELEMENT pagecount (#PCDATA)>
<!ELEMENT wordcount (#PCDATA)>
<!ELEMENT charcount (#PCDATA)>
<!ELEMENT appname (#PCDATA)>
<!ELEMENT security (#PCDATA)>
<!ELEMENT table (col+,row*)>
<!ATTLIST table
name CDATA #REQUIRED>
<!ELEMENT col (#PCDATA)>
<!ATTLIST col
key (yes|no) #IMPLIED
def CDATA #IMPLIED>
<!ELEMENT row (td+)>
<!ELEMENT td (#PCDATA)>
<!ATTLIST td
href CDATA #IMPLIED
dt:dt (string|bin.base64) #IMPLIED
md5 CDATA #IMPLIED>
]>
<msi version="2.0" xmlns:dt="urn:schemas-microsoft-com:datatypes" codepage="65001">
But in the new XML it's gone and instead there is only:
<msi xmlns:ns0="urn:schemas-microsoft-com:datatypes" codepage="65001" version="2.0">
There are more differences, this is just an example.
The python code I use to make the change is
tree = Et.parse(ism_file_path)
root = tree.getroot()
for attributes_group in root:
for attribute in attributes_group:
if attribute.tag == "revnumber":
new_package_code = increment_hex_number(attribute.text)
attribute.text = new_package_code
tree.write(ism_file_path)
Thank you!
Eventually I moved to a new library - lxml.
This library, in opposed to xml.etree.ElementTree
keeps the order of all tags, so I did exactly the same and it worked:
def modify_ism_file(ism_file_path):
context = etree.iterparse(ism_file_path)
for action, attributes_group in context:
for attribute in attributes_group:
if attribute.tag == "revnumber":
print "Found package code. TAG = {0} TEXT = {1}".format(attribute.tag, attribute.text)
new_package_code = increment_hex_number(attribute.text)
print "New package code is {0}".format(new_package_code)
attribute.text = new_package_code
obj_xml = etree.tostring(context.root, pretty_print=True, xml_declaration=True, encoding="utf-8")
with open(ism_file_path, "w") as f:
f.write(obj_xml)