I have an xsd schema file, which include the following definition:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
...
<xs:element name="CreateDate" minOccurs="0" maxOccurs="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="8"/>
<xs:maxLength value="10"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
...
And an xml file which includes the following element:
<?xml version="1.0" encoding="utf-8"?>
...
<!-- format is: YYYY/dd/mm -->
<CreateDate>2020/10/22</CreateDate>
...
I'm using xmlschema to parse the xml file like so:
schema = xmlschema.XMLSchema(schema_file)
element = schema.to_dict(xml_file, datetime_types=True)
Obviously, CreateDate is parsed to a string instead of a Date
object. Questions:
xmlschema
automatically parses CreateDate to Date
using format "YYYY/mm/dd"?value_hook
or element_hook
arguments to to_dict()
, but I'm not sure how to go about it. Any suggestions?I only find the hook option on iter_decode
, here is an example that just assumes a single element schema and instance:
from pprint import pprint
import xmlschema
from elementpath import datatypes
from datetime import datetime
schema = xmlschema.XMLSchema('schema1.xsd')
def my_element_hook(elementData, xsdElement, xsdType):
thisDate = datetime.strptime(elementData.text, '%Y/%m/%d')
return xmlschema.ElementData(tag=elementData.tag,text=datatypes.Date10(thisDate.year, thisDate.month, thisDate.day),attributes=None,content=None)
for value in schema.iter_decode('sample1.xml',datetime_types=True,element_hook=my_element_hook):
pprint(value)
For a sample sample1.xml like
<?xml version="1.0" encoding="utf-8"?>
<!-- format is: YYYY/mm/dd -->
<CreateDate>2020/10/22</CreateDate>
and a schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="CreateDate">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="8"/>
<xs:maxLength value="10"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
I get e.g. Date10(2020, 10, 22)
.
For a more complex schema I guess the element hook needs to return elementData
unchanged for the elements you don't want to manipulate and then for e.g. elementData.tag = 'CreateDate'
use the presented code e.g.
def my_element_hook(elementData, xsdElement, xsdType):
if elementData.tag == 'CreateDate':
thisDate = datetime.strptime(elementData.text, '%Y/%m/%d')
return xmlschema.ElementData(tag=elementData.tag,text=datatypes.Date10(thisDate.year, thisDate.month, thisDate.day),attributes=None,content=None)
else:
return elementData