Search code examples
pythonjsonxmlxmltodict

How to traverse nested JSON object and delete/modify them using Python after parsing the XML file?


Given XML, I need to convert it to JSON and modify the JSON object.

<?xml version="1.0" standalone="yes"?>
<!--COUNTRIES is the root element-->
<WORLD>
    <country name="A">
        <event day="323" name="$abcd"> </event>
        <event day="23" name="$aklm"> </event>

        <neighbor name="B" direction="W" friend="T"></neighbor>
        <neighbor name="B" direction="W"></neighbor>
        <neighbor name="B" direction="W"></neighbor>
    </country>
    <country name="C">
        <event day="825" name="$nmre"> </event>
        <event day="329" name="$lpok"> </event>
        <event day="145" name="$dswq"> </event>
        <event day="256" name="$tyul"> </event>

        <neighbor name="D" direction="N"/>
        <neighbor name="B" direction="W" friend="T"/>
    </country>
</WORLD>

I want to remove "event" element in the final output of JSON file, and "friend" attribute, which is present inside "WORLD"-> "country"-> "neighbor". I am using "xmltodict" library in Python and successfully able to convert XML to JSON, but could not able to remove these elements and attributes from JSON file.

Python Code:

import xmltodict, json
class XMLParser:
    def __init__(self, xml_file_path):
        self.xml_file_path = xml_file_path
        if not self.xml_file_path:
            raise ValueError("XML file path is not found./n")
        with open (self.xml_file_path, 'r') as f:
            self.xml_file = f.read()

    def parse_xml_to_json(self):
        xml_file = self.xml_file
        json_data = xmltodict.parse(xml_file, attr_prefix='')
        if 'event' in json_data['WORLD']['country']:
            del json_data['WORLD']['country']['event']
        return json.dumps(json_data, indent=4)
  
xml_file_path = "file_path"
xml_parser = XMLParser(xml_file_path)
json_object = xml_parser.parse_xml_to_json()
print(json_object)

Please suggest.


Solution

  • You can use a recursive function to remove the unwanted keys from the dictionary. Below is a function that checks each dictionary for a key, and removes it if found, then iterates through the values of each dict and the items of each list and does applies the function again.

    def remove_key(d: dict, key: str):
        if key in d:
            d.pop(key)
        for val in d.values():
            if isinstance(val, list):
                for item in val:
                    remove_key(item, key)
            if isinstance(val, dict):
                remove_key(val, key)
    

    First, parse the input XML:

    import xmltodict
    import json
    
    xmltext = """<?xml version="1.0" standalone="yes"?>
    <!--COUNTRIES is the root element-->
    <WORLD>
        <country name="A">
            <event day="323" name="$abcd"> </event>
            <event day="23" name="$aklm"> </event>
    
            <neighbor name="B" direction="W" friend="T"></neighbor>
            <neighbor name="B" direction="W"></neighbor>
            <neighbor name="B" direction="W"></neighbor>
        </country>
        <country name="C">
            <event day="825" name="$nmre"> </event>
            <event day="329" name="$lpok"> </event>
            <event day="145" name="$dswq"> </event>
            <event day="256" name="$tyul"> </event>
    
            <neighbor name="D" direction="N"/>
            <neighbor name="B" direction="W" friend="T"/>
        </country>
    </WORLD>"""
    
    d = xmltodict(xmltext)
    

    The value of d is the following:

    d
    # d has this value:
    {'WORLD': {'country': [{'@name': 'A',
        'event': [{'@day': '323', '@name': '$abcd'},
         {'@day': '23', '@name': '$aklm'}],
        'neighbor': [{'@name': 'B', '@direction': 'W', '@friend': 'T'},
         {'@name': 'B', '@direction': 'W'},
         {'@name': 'B', '@direction': 'W'}]},
       {'@name': 'C',
        'event': [{'@day': '825', '@name': '$nmre'},
         {'@day': '329', '@name': '$lpok'},
         {'@day': '145', '@name': '$dswq'},
         {'@day': '256', '@name': '$tyul'}],
        'neighbor': [{'@name': 'D', '@direction': 'N'},
         {'@name': 'B', '@direction': 'W', '@friend': 'T'}]}]}}
    

    Applying the function to d removes the unwanted keys:

    remove_key(d, 'event')
    remove_key(d, '@friend')
    
    d
    # d now has this value:
    {'WORLD': {'country': [{'@name': 'A',
        'neighbor': [{'@name': 'B', '@direction': 'W'},
         {'@name': 'B', '@direction': 'W'},
         {'@name': 'B', '@direction': 'W'}]},
       {'@name': 'C',
        'neighbor': [{'@name': 'D', '@direction': 'N'},
         {'@name': 'B', '@direction': 'W'}]}]}}
    

    Now you can export to JSON.

    with open('output.json', 'w') as fp:
        json.dump(d, fp, indent=4)