I'm pretty new in programming and it is the first time I use xml, but for class I'm doing a gender classification project with a dataset of Blogs. I have a folder which consists of xml files. Now I need to make a list of names of the files there. Then I should be able to run through the list with a loop and open each file containing XML and get out of it what I want (ex. Text and class) and then store that in another variable, like adding it to a list or dictionary.
I tried something, but it isn't right and I'm kind of stuck. Can someone help me? This is wat I have so far:
path ='\\Users\\name\\directory\\folder'
dir = os.listdir( path )
def select_files_in_folder(dir, ext):
for filename in os.listdir(path):
fullname= os.path.join(path, filename)
tree = ET.parse(fullname)
for elem in doc.findall('gender'):
print(elem.get('gender'), elem.text)
If you want to build a list of all the xml files in a given directory you can do the following
def get_xml_files(path):
xml_list = []
for filename in os.listdir(path):
if filename.endswith(".xml"):
xml_list.append(os.path.join(path, filename))
return xml_list
just keep in mind that this is not recursive through the folders and it's just assuming that the xml files finish with .xml.
Parsing xml is highlly dependent of the library you'll be using. From your code I guess you're using xml.etree.ElementTree (keep in mind this lib is not safe against maliciously constructed data).
def get_xml_data(list):
data = []
for filename in list :
root = ET.parse(filename)
data = [ text for text in root.findall("whatever you want to get") ]
return data