Search code examples
pythonlxmlglob

Parsing a folder of xml using glob and lxml


I'm having some difficulty trying to parse a folder of valid xml files (*.ditamap) using python 3 and lxml.

The error returned is

"lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1"

my code

import glob
import lxml.etree as et

for file in glob.glob('*.ditamap'):
    with open(file) as xml_file:
        #tree = et.parse("0579182.ditamap")
        tree = et.parse(xml_file)
        print (et.tostring(tree, pretty_print=True))

et.parse works when i pass a filename directly, but not when I pass the file variable.

What am I doing wrong? Seems like there is a some kind of IO error or tpye mismatch but I cannot see what I am doing wrongly...


Solution

  • et.parse expects a file name but you are giving it an opened file. Try to pass your file variable.

    import glob
    import lxml.etree as et
    
    for f in glob.glob('*.ditamap'):
        tree = et.parse(f)
        print (et.tostring(tree, pretty_print=True))
    

    You may want to consider using glob.iglob because you are only using it as an iterator.

    Edit: Overread that et.parse can accpect file objets. Give it a try nevertheless.