Search code examples
pythonxml

read xml items with python


I have a xml file from which I need to extract value. The file is much longer, but attached is a sample of it. I need to extract the information of key and value, but so far I am not even being able to read it, since python is not recognizing any columns called key or value.

Attached is a sample of my data which I need to extract:

</info-forzut-change-manifest>
  <repository-location derived-from='/datasources/iteminfos?rev=1.0' id='iteminfos' path='/datasources' revision='1.1' />
  <connection class='federated'>
    <named-connections>
      <named-connection caption='po-yq-ss.ttt.companyx.com' name='SZ'>
      <relation join='left' type='join'>
        <clause type='join'>
          <expression op='AND'>
            <expression op='='>
              <expression op='[companyx.dm.ss.ply.zut/CV_item_info_ind].[SOURSYSTEM]' />
              <expression op='[companyx.dm.ss.ply.code/CV_item_code].[SOURSYSTEM]' />
            </expression>
            <expression op='='>
              <expression op='[companyx.dm.ss.ply.zut/CV_item_info_ind].[zutNR]' />
              <expression op='[companyx.dm.ss.ply.code/CV_item_code].[item]' />
            </expression>
          </expression>
        </clause>
        <relation join='inner' type='join'>
          <clause type='join'>
            <expression op='AND'>
              <expression op='='>
                <expression op='[companyx.dm.ss.ply.zut/CV_item_info_title].[SOURSYSTEM]' />
                <expression op='[companyx.dm.ss.ply.zut/CV_item_info_ind].[SOURSYSTEM]' />
              </expression>
              <expression op='='>
                <expression op='[companyx.dm.ss.ply.zut/CV_item_info_title].[MBLNR]' />
                <expression op='[companyx.dm.ss.ply.zut/CV_item_info_ind].[MBLNR]' />
              </expression>
              <expression op='='>
                <expression op='[companyx.dm.ss.ply.zut/CV_item_info_title].[MJAHR]' />
                <expression op='[companyx.dm.ss.ply.zut/CV_item_info_ind].[MJAHR]' />
              </expression>
            </expression>
          </clause>
          <relation connection='db.kjqmic2fw80inb' name='companyx.dm.ss.ply.zut/CV_item_info_title' tcooe='[_ssp_ooc].[companyx.dm.ss.ply.zut/CV_item_info_title]' type='tcooe' />
          <relation connection='db.kjqmic2fw80inb' name='companyx.dm.ss.ply.zut/CV_item_info_ind' tcooe='[_ssp_ooc].[companyx.dm.ss.ply.zut/CV_item_info_ind]' type='tcooe' />
        </relation>
        <relation connection='db.kjqmic2fw80inb' name='companyx.dm.ss.ply.code/CV_item_code' tcooe='[_ssp_ooc].[companyx.dm.ss.ply.code/CV_item_code]' type='tcooe' />
      </relation>
      <cols>
        <map key='[cooAD (companyx.dm.ss.ply.zut/CV_item_info_ind)]' value='[companyx.dm.ss.ply.zut/CV_item_info_ind].[cooAD]' />
        <map key='[cooAD]' value='[companyx.dm.ss.ply.zut/CV_item_info_title].[cooAD]' />
        <map key='[AF_COLOR]' value='[companyx.dm.ss.ply.code/CV_item_code].[AF_COLOR]' />
        <map key='[AF_FCOCO]' value='[companyx.dm.ss.ply.code/CV_item_code].[AF_FCOCO]' />
        <map key='[AF_GENDER]' value='[companyx.dm.ss.ply.code/CV_item_code].[AF_GENDER]' />
        <map key='[AF_GRID]' value='[companyx.dm.ss.ply.code/CV_item_code].[AF_GRID]' />
        <map key='[AF_STYLE]' value='[companyx.dm.ss.ply.code/CV_item_code].[AF_STYLE]' />
        <map key='[AKTNR (companyx.dm.ss.ply.zut/CV_item_info_ind)]' value='[companyx.dm.ss.ply.zut/CV_item_info_ind].[AKTNR]' />
        <map key='[AKTNR]' value='[companyx.dm.ss.ply.zut/CV_item_info_H

I am looking for a list of all values, let's say:

[companyx.dm.ss.ply.zut/CV_item_info_title].[cooAD]
[companyx.dm.ss.ply.code/CV_item_code].[AF_COLOR]
[companyx.dm.ss.ply.code/CV_item_code].[AF_FCOCO]
[companyx.dm.ss.ply.code/CV_item_code].[AF_GENDER]

So far I have been trying reading my xml file with pandas package but no success. Pandas is being able to recognize some columns of the hole data, but none of the information above is being read.

Is there any way of doing that? I've read about "etree.ElementTree" but couldn't make it work.

Any help is pretty much appreciated.

Thank you in advance!


Solution

  • How about this?

    import xml.etree.ElementTree as ET
    tree = ET.parse('your_xml_file.xml')
    maps = tree.findall('.//map')
    
    # Extract the 'key' and 'value' attributes from each 'map' element
    # Save them in a dictionary where 'key' is the dictionary key and 'value' is the dictionary value
    data = {m.get('key'): m.get('value') for m in maps}
    
    print(data)