Search code examples
pythonxml

Skipping files if xml like part is missing


I am analyzing the xml data of several files. To get my data, I first need to split the xml data from the whole file to be able to work with it.

For this I use the split() method and search for <Data.

Here I run into a problem.

Some of the files have no xml data in them and therefore these files I would like to simply skip.

path = r"C:\Users\Nathan\Desktop\Test\*.xml"

for xml in glob.glob(path):
    with open(xml) as data_file
        file_content = data_file.read

        xml_part1 = file_content.split("<Data",1)[1] #here i get an Error if "<Data" is not in the file In
        xml_part2 = file_content.split("Data>",1)[0]
        xml_file = "<Data" + xml_part2+"Data>"

For help I would be very grateful


Solution

  • You can use a try-catch block to catch the exception when the split() method fails because there is no xml data in that file.

    import glob
    
    path = r"C:\Users\Nathan\Desktop\Test\*.xml"
    
    for xml in glob.glob(path):
        with open(xml, 'r') as data_file:
            file_content = data_file.read()
    
            try:
                xml_part1 = file_content.split("<Data", 1)[1]
                xml_part2 = xml_part1.split("Data>", 1)[0]
                xml_data = "<Data" + xml_part2 + "Data>"
                
                # Do stuff with your xml data
                
            except IndexError:
                print(f"Skipping file {xml} as it does not contain <Data>")