Search code examples
pythonpython-itertoolsextract

Python:Error in extracting data from input file(xml file),Loop stops after some iterations


I have an XML file which looks something like this,example The file contains 5000 profiles(set of data) each containing 92 rows and 5 columns, each profile is separated by 2 lines(which I want to skip). I want to extract some selected profiles and write to another file.I have made the following program to do this. But with this code, I'm able to extract only limited profiles.

    with open('file.xml') as f:
      for j in lat :
        l=94*j
        i=l-92
        g.write('%s' % j)
        g.write(":-profile")
        g.write("\n")
        for lines in itertools.islice(f, i, l): 
          g.write('%s' % lines)
        g.write("</Matrix>")
        g.write("\n")
        g.write('<Matrix nrows="92" ncols="5">')
        g.write("\n")

When I printed 'j',It is taking up all the values of 'lat'(my selected profiles). In my output file I'm getting values only up to few profiles and after which it simply shows the last lines

        g.write("</Matrix>")
        g.write("\n")
        g.write('<Matrix nrows="92" ncols="5">')
        g.write("\n")

I know it's very silly,But I'm a beginner in python programming..Please help

I tried printing 'j' and'lines' together, after certain iterations the output showed only the values of j ,there was not output for lines


Solution

  • import re
    
    nums_profiles = set()
    with open("lat_sel.dat", "r") as num_profiles_file:
        for line in num_profiles_file.readlines():
            for i in line.split():
                nums_profiles.add(int(i))
    
    with open('extracted_output.xml', 'w') as output_file, open('chevallierl91_clear_q.xml', "r") as matrix_file:
        profile_counter = 0
    
        for line in matrix_file.readlines():
    
            # save the ending xml tags
            for end_tag in ['</Array>', '</arts>']:
                if end_tag in line:
                    output_file.write(line)
    
            # counting profiles
            if 'Matrix nrows' in line:
                profile_counter += 1
    
            # save header of xml file
            if profile_counter == 0:
                if '<Array type="Matrix" nelem=' in line:
                    line = re.sub('nelem="[0-9]+"', 'nelem="%s"', line) % len(nums_profiles)
    
                output_file.write(line)
    
            # check if profile is the one which we need. If so, save data
            if profile_counter in nums_profiles:
                output_file.write(line)