I have an XML file which looks something like this,example The file contains 5000 profiles(set of data) each containing 92 rows and 5 columns, each profile is separated by 2 lines(which I want to skip). I want to extract some selected profiles and write to another file.I have made the following program to do this. But with this code, I'm able to extract only limited profiles.
with open('file.xml') as f:
for j in lat :
l=94*j
i=l-92
g.write('%s' % j)
g.write(":-profile")
g.write("\n")
for lines in itertools.islice(f, i, l):
g.write('%s' % lines)
g.write("</Matrix>")
g.write("\n")
g.write('<Matrix nrows="92" ncols="5">')
g.write("\n")
When I printed 'j',It is taking up all the values of 'lat'(my selected profiles). In my output file I'm getting values only up to few profiles and after which it simply shows the last lines
g.write("</Matrix>")
g.write("\n")
g.write('<Matrix nrows="92" ncols="5">')
g.write("\n")
I know it's very silly,But I'm a beginner in python programming..Please help
I tried printing 'j' and'lines' together, after certain iterations the output showed only the values of j ,there was not output for lines
import re
nums_profiles = set()
with open("lat_sel.dat", "r") as num_profiles_file:
for line in num_profiles_file.readlines():
for i in line.split():
nums_profiles.add(int(i))
with open('extracted_output.xml', 'w') as output_file, open('chevallierl91_clear_q.xml', "r") as matrix_file:
profile_counter = 0
for line in matrix_file.readlines():
# save the ending xml tags
for end_tag in ['</Array>', '</arts>']:
if end_tag in line:
output_file.write(line)
# counting profiles
if 'Matrix nrows' in line:
profile_counter += 1
# save header of xml file
if profile_counter == 0:
if '<Array type="Matrix" nelem=' in line:
line = re.sub('nelem="[0-9]+"', 'nelem="%s"', line) % len(nums_profiles)
output_file.write(line)
# check if profile is the one which we need. If so, save data
if profile_counter in nums_profiles:
output_file.write(line)