Search code examples
pythonxmlparsingcdata

Python Xml Parsing having CDATA


I have below xml, in this need to update value in CDATA section for tag . I tried with element tree to parse using xpath till vsdata, able to get CDATA and update value of f1. But the issue is after updating, in updated xml only content of CDATA remains rest of the xml is not seen.

rootElement=rootElement.findall(xpath)[0] -> Xpath till vsdata.
rootElement=et.fromstring(rootElement.iter().next().text)
for each in rootElement[0]:
  if each.tag == paramname:
     each.text = str(valueToSet)
     print(each.tag, each.text) 

<config>
<subconfig>
  <a>First Cell</a>
  <b>Second Cell</b>
  <vsDataContainer>
      <id>0</id>
       <vsData><![CDATA[
          <g>
            <f>
              <f1>10</f1>
              <f2>20</f2>
              <f3>30</f3>
            </f>
          </g>
        ]]></vsData>
    </vsDataContainer>
</subconfig>
</config>


After updating in new xml only following is remained
 <g>
    <f>
       <f1>50</f1>
       <f2>20</f2>
       <f3>30</f3>
    </f>
</g>

But i need it as original with value f1 updated to new value, Could somebody help on this?

<config>
<subconfig>
  <a>First Cell</a>
  <b>Second Cell</b>
  <vsDataContainer>
      <id>0</id>
       <vsData><![CDATA[
          <g>
            <f>
              <f1>50</f1>
              <f2>20</f2>
              <f3>30</f3>
            </f>
          </g>
        ]]></vsData>
    </vsDataContainer>
</subconfig>
</config>

Solution

  • Below

    import xml.etree.ElementTree as ET
    
    xml = '''<config>
    <subconfig>
      <a>First Cell</a>
      <b>Second Cell</b>
      <vsDataContainer>
          <id>0</id>
           <vsData><![CDATA[
              <g>
                <f>
                  <f1>10</f1>
                  <f2>20</f2>
                  <f3>30</f3>
                </f>
              </g>
            ]]></vsData>
        </vsDataContainer>
    </subconfig>
    </config>'''
    
    f1_new_value = '999'
    root = ET.fromstring(xml)
    vs_data = root.find('.//vsData')
    inner_xml = vs_data.text.strip()
    inner_root = ET.fromstring(inner_xml)
    inner_root.find('.//f1').text = f1_new_value
    vs_data.text = '![CDATA[' + ET.tostring(inner_root).decode('utf-8') + ']]'
    root_str = ET.tostring(root)
    root_str = str(root_str.decode('utf-8').replace('&lt;', '<').replace('&gt;', '>').replace('\\n', ''))
    print(root_str)
    

    output

    <config>
    <subconfig>
      <a>First Cell</a>
      <b>Second Cell</b>
      <vsDataContainer>
          <id>0</id>
           <vsData>![CDATA[<g>
                <f>
                  <f1>999</f1>
                  <f2>20</f2>
                  <f3>30</f3>
                </f>
              </g>]]</vsData>
        </vsDataContainer>
    </subconfig>
    </config>