Search code examples
pythonxml

Match codes from xml file to string literals in txt file in python


I am trying to match xml enumerations to corresponding string literals in python. The xml file and txt file are below.

XML file

<?xml version="1.0" encoding="utf-8"?><xs:schema id="Enumerations" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:simpleType name="TypeMakeModelCode"><xs:restriction base="xs:string">
<xs:enumeration value="AUTO-AC-300" />
<xs:enumeration value="AUTO-ACAD-BEA" />
<xs:enumeration value="AUTO-ACAD-CAN" />
<xs:enumeration value="AUTO-ACAD-INV" />
<xs:enumeration value="AUTO-ACUR-ATL" />
<xs:enumeration value="AUTO-ACUR-CL" />
<xs:enumeration value="AUTO-ACUR-CSX" />
<xs:enumeration value="AUTO-ACUR-EL" />
<xs:enumeration value="AUTO-ACUR-ILX" />
<xs:enumeration value="AUTO-ACUR-INT" />
</xs:restriction></xs:simpleType></xs:schema>

txt file

VEHICLE CODE      VEHICLE LITERAL
__________________________________________________

AUTO-AC-300        A C (GREAT BRITIAN) 3000 ME
AUTO-ACAD-BEA      ACADIAN (GM OF CANADA) BEAUMONT SERIES
AUTO-ACAD-CAN      ACADIAN (GM OF CANADA) CANSO SERIES
AUTO-ACAD-INV      ACADIAN (GM OF CANADA) INVADER SERIES
AUTO-ACUR-ATL      ACURA TL
AUTO-ACUR-CL       ACURA CL
AUTO-ACUR-CSX      ACURA CSX
AUTO-ACUR-EL       ACURA EL
AUTO-ACUR-ILX      ACURA NULL
AUTO-ACUR-INT      ACURA INTEGRA

I haven't tried anything yet as I'm not sure how to approach it.


Solution

  • You dont need the xml file. Just the text file

    data = {}
    with open('data.txt') as f:
        lines = f.readlines()
        for line in lines:
            if line.startswith('AUTO'):
                parts = line.split()
                key = parts[0]
                value = ' '.join(parts[1:])
                data[key] = value
    print(data)
    

    output

    {'AUTO-AC-300': 'A C (GREAT BRITIAN) 3000 ME', 'AUTO-ACAD-BEA': 'ACADIAN (GM OF CANADA) BEAUMONT SERIES', 'AUTO-ACAD-CAN': 'ACADIAN (GM OF CANADA) CANSO SERIES', 'AUTO-ACAD-INV': 'ACADIAN (GM OF CANADA) INVADER SERIES', 'AUTO-ACUR-ATL': 'ACURA TL', 'AUTO-ACUR-CL': 'ACURA CL', 'AUTO-ACUR-CSX': 'ACURA CSX', 'AUTO-ACUR-EL': 'ACURA EL', 'AUTO-ACUR-ILX': 'ACURA NULL', 'AUTO-ACUR-INT': 'ACURA INTEGRA'}