I have got a script that delves into a given directory recursively, digs out all files that have a specific file name and puts the contents of all the ones it finds into a single file.
The script works as I need it to in terms of hunting the files and outputting all the contents to a single xml file. However I would like it to add in the file path as well so I know the location of each given file it has found. This is my code so far:
import glob
lines=[] #list
for inputfile in glob.glob('D:\path\to\scan\**\project_information.xml', recursive=True):
with open ('D:\path\result\output.xml', 'w') as out_file:
with open (inputfile, "rt") as in_file:
print("<projectpath>", inputfile, file=out_file)
for line in in_file:
lines.append(line.rstrip('\n'))
for linenum, line in enumerate(lines):
print(line, file=out_file)
Each project_information.xml the script looks for looks like the below:
<project>
<name>project1</name>
<Projection>UTM84-30N</Projection>
<Build>Jan 30 2015</Build>
</project>
Ideally for every entry that is found and subsequently added to my output I would like it write the file path within the tag. (You may have guess this is an XML table to go in to Excel) for example -
<project>
**<filepath>C:\path\to\file\**
<name>project1</name>
<Projection>UTM84-30N</Projection>
<Build>Jan 30 2015</Build>
</project>
<project>
**<filepath>C:\path\to\file\blah\**
<name>project2</name>
<Projection>UTM84-30N</Projection>
<Build>Jan 30 2015</Build>
</project>
I thought adding a line after the with statement would do it but that only appears to print the first occurrence and not get in the loop - Not only that even if that did work, it wouldn't be within the parent tag.
print("<projectpath>", inputfile, file=out_file)
Pretty new to python so any suggestions or changes to my crude code are very much appreciated!
############UPDATESo as @olinox14 suggested I had a look at the lxml module and got it to add a new tag in with that.
tree = ET.parse(in_file)
root = tree.getroot()
for child in root:
folder = ET.SubElement(child, 'folder')
folder.text = (in_file.name)
tree.write('C:\\output\\output2.xml')
Many things here:
for inputfile in...
and with open(...) as out_file
to avoid opening/closing this file at each iterationprint(..., file=...)
when you can use out_file.write(...)
A good start here, you could need to arrange it:
import glob
from lxml import etree
root = etree.Element("root")
with open ('D:\path\result\output.xml', 'w') as out_file:
for inputfile in glob.glob('D:\path\to\scan\**\project_information.xml', recursive=True):
with open (inputfile, "r") as in_file:
project = etree.SubElement(root, "project", path=in_file)
# ... continue with your processing
root.write(out_file, pretty_print=True)