Search code examples
pythonxmllxmlxml-namespaces

"Invalid tag name" error when creating element with lxml in python


I am using lxml to make an xml file and my sample program is :

from lxml import etree
MESSAGETYPEINDIC = 'CRS701'
REPPERIOD = datetime.now().strftime("%Y-%m-%d")
root = etree.Element("crsdac2:CRS-DAC2-LT", attrib={'xmlns:crsdac2': 'urn:sti:ties:crsdac2:v1', 'xmlns:crs': 'urn:sti:ties:sask:v1','xmlns:xsi':'http://www.w3.org/2001/XMLSchema-instance', 'version':'3.141590118408203125', 'xsi:schemaLocation': 'urn:sti:ties:crsdac2:v1 file:///G:/Tax/Tax%20Technology/CRS%20(DAC2)/XML%20Specifikacija%20(versija%20nuo%202020-12)/CRS-DAC2-LT_v0.4.xsd' })
crsDAC2_messageSpec = etree.SubElement(root, "crsdac2:MessageSpec")
crsDAC2_messageSpec_messagetypeindic = etree.SubElement(crsDAC2_messageSpec, "crs:MessageTypeIndic").text = MESSAGETYPEINDIC
crsDAC2_messageSpec_repperiod = etree.SubElement(crsDAC2_messageSpec, "crs:ReportingPeriod").text = REPPERIOD
crsDAC2_messageBody = etree.SubElement(root, "crsdac2:MessageBody")
tree = etree.ElementTree(root)
print(tree)
tree_string = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding='UTF-8', standalone="yes")
print(tree_string)

I am getting the below error when I tried running the code above. Can you please help me with resolving this.

ValueError: Invalid tag name 'crsdac2:CRS-DAC2-LT'

I need the output as per below:

<?xml version="1.0" encoding="UTF-8"?>
<crsdac2:CRS-DAC2-LT xmlns:crsdac2="urn:sti:ties:crsdac2:v1" xmlns:crs="urn:sti:ties:crstypessti:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="3.141590118408203125" xsi:schemaLocation="urn:sti:ties:crsdac2:v1 file:///G:/Tax/Tax%20Technology/CRS%20(DAC2)/XML%20Specifikacija%20(versija%20nuo%202020-12)/CRS-DAC2-LT_v0.4.xsd">
    <crsdac2:MessageSpec>
            <crs:MessageTypeIndic>CRS701</crs:MessageTypeIndic>
            <crs:ReportingPeriod>2021-12-31</crs:ReportingPeriod>   
    </crsdac2:MessageSpec>
    <crsdac2:MessageBody>
    </crsdac2:MessageBody>
</crsdac2:CRS-DAC2-LT>

Solution

  • When creating an element or attribute bound to a namespace, you need to use the namespace URI (not the prefix). I suggest using the QName helper class to do this.

    from lxml.etree import Element, SubElement, QName, tostring
    from datetime import datetime
    
    ns1 = "urn:sti:ties:crsdac2:v1"
    ns2 = "urn:sti:ties:crstypessti:v1"
    ns3 = 'http://www.w3.org/2001/XMLSchema-instance'
    
    xsd = "file:///G:/Tax/Tax%20Technology/CRS%20(DAC2)/XML%20Specifikacija%20(versija%20nuo%202020-12)/CRS-DAC2-LT_v0.4.xsd"
    
    MESSAGETYPEINDIC = 'CRS701'
    REPPERIOD = datetime.now().strftime("%Y-%m-%d")
    
    root = Element(QName(ns1, "CRS-DAC2-LT"), nsmap={"crsdac2": ns1, "crs": ns2})
    root.set(QName(ns3, "schemaLocation"), xsd)
    root.set("version", "3.141590118408203125")
    
    messageSpec = SubElement(root, QName(ns1, "MessageSpec"))
    
    messageTypeIndic = SubElement(messageSpec, QName(ns2, "MessageTypeIndic"))
    messageTypeIndic.text = MESSAGETYPEINDIC
    
    messageSpec_repperiod = SubElement(messageSpec, QName(ns2, "ReportingPeriod"))
    messageSpec_repperiod.text = REPPERIOD
    
    messageBody = SubElement(root, QName(ns1, "MessageBody"))
    
    tree_string = tostring(root, pretty_print=True, xml_declaration=True,
                                 encoding='UTF-8', standalone="yes")
    print(tree_string.decode())
    

    Output:

    <?xml version='1.0' encoding='UTF-8' standalone='yes'?>
    <crsdac2:CRS-DAC2-LT xmlns:crs="urn:sti:ties:crstypessti:v1" xmlns:crsdac2="urn:sti:ties:crsdac2:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="file:///G:/Tax/Tax%20Technology/CRS%20(DAC2)/XML%20Specifikacija%20(versija%20nuo%202020-12)/CRS-DAC2-LT_v0.4.xsd" version="3.141590118408203125">
      <crsdac2:MessageSpec>
        <crs:MessageTypeIndic>CRS701</crs:MessageTypeIndic>
        <crs:ReportingPeriod>2022-12-20</crs:ReportingPeriod>
      </crsdac2:MessageSpec>
      <crsdac2:MessageBody/>
    </crsdac2:CRS-DAC2-LT>