File Structure
I have a folder, called test_folder, which has several subfolders (named different fruit names, as you'll see in my code below) within it. In each subfolder, there is always a metadump.xml file where I am extracting information from.
Current Stance
I have been able to achieve this on an individual basis, where I specify the subfolder path.
import re
in_file = open("C:/.../Downloads/test_folder/apple/metadump.xml")
contents = in_file.read()
in_file.close()
title = re.search('<dc:title rsfieldtitle="Title"
rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>',
contents).group(1)
print(title)
Next Steps
I would like to perform the following function on a larger scale by simply referencing the parent folder C:/.../Downloads/test_folder and making my program find the xml file for each subfolder to extract the desired information, rather than individually specifying every fruit subfolder.
Clarification
Rather than simply obtaining a list of subfolders or a list of xml files within these subfolders, I would like physically access these subfolders to perform this text extraction function from each xml file within each subfolder.
Thanks in advance for your help.
You can use Python's os.walk()
to traverse all of the subfolders. If the file is metadump.xml
, it will open it and extract your title. The filename and the title is displayed:
import os
for root, dirs, files in os.walk(r"C:\...\Downloads\test_folder"):
for file in files:
if file == 'metadump.xml':
filename = os.path.join(root, file)
with open(filename) as f_xml:
contents = f_xml.read()
title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
print('{} : {}'.format(filename, title))