I'm fairly new to Python and coding in general so sorry if this is a very simple question. I'm working with the python packages XMLschema to validate some very large xml files. When I use the following code to get the error messages i only get the paths for the errors. This is okey with there are only 5-6 different "knude" but i have files which have 200+ of "knude" which makes this knowlegde very unusefull. I would there for like to the line number so I can go to the xml file and correct it.
Code:
def get_validation_errors(xml_file, xsd_file):
schema = xmlschema.XMLSchema(xsd_file)
validation_error_iterator = schema.iter_errors(xml_file)
errors = list()
for idx, validation_error in enumerate(validation_error_iterator, start=1):
err = validation_error.__str__()
errors.append(err)
print(f'[{idx}] path: {validation_error.path} | reason: {validation_error.reason} | message: {validation_error.message}')
return errors
Results:
[1] path: /KnudeGroup/Knude[5]/StatusKode | reason: value must be one of [1, 2, 3, 4, 8] | message: failed validating 0 with XsdEnumerationFacets([1, 2, 3, 4, 8])
I have already tried reading the documentation and searched google and stackoverflow for an answer, but could not find any.
Load the XML instance document with lxml, that way you have sourceline
property on a validation error (https://github.com/sissaschool/xmlschema/blob/v2.2.3/xmlschema/validators/exceptions.py#L90) e.g. a minimal example would be
import lxml.etree as ET
from xmlschema import XMLSchema
xml_doc = ET.parse("sample1.xml")
schema = XMLSchema("sample1.xsd")
for error in schema.iter_errors(xml_doc):
print(f'sourceline: {error.sourceline}; path: {error.path} | reason: {error.reason} | message: {error.message}')
and that way then outputs a line number as sourceline
e.g. sourceline: 2; path: /root/item[1] | reason: invalid value 'a' for xs:decimal | message: failed validating 'a' with XsdAtomicBuiltin(name='xs:decimal')
.