Search code examples
xmldatetimezone

Date without time but with timezone information


I am extracting information from a thousands of XML files using Python, the XML files are a gathering from contributed files that are certainly not created by one and the same toolchain, which can be seen from different formatting, some source files having no newlines at all, etc.

I came across the following breaking my processing routines:

<SubmissionDeadlinePeriod>
<EndDate>2024-02-21+01:00</EndDate>
<EndTime>10:00:00+01:00</EndTime>
</SubmissionDeadlinePeriod>

From the initial example XML files, which had EndDates of the YYYY-mm-dd variety, I did not expect what looks like timezone information on a date, as I am used Python's datetime.date type which doesn't have timezone information associated with it. Timezone information is only available in datetime.time and datetime.datetime instances.

Things were easily remidied by splitting on '+' and taking the first element (and after parsing a few thousand files more, also adding code to remove a final Z from a string to be converted into a datetime.date).

I am not used to have a timezone on a date without time information. My YAML parser doesn't allow timezone information if a timestamp has not at least the hours, minutes and seconds part. And if I read ISO 8601 correctly, timezones can only be added to a time or to a date+time, but not a date.

Is there some specific reason to have a timezone attached to a date as well as to the time in this context, that I am missing? Or is this just an artifact of some XML generating library in some unknown language?

I can see that splitting up a date and time in different fields might make it easier on routines to extract just the date from XML using tag information, but that ease seems to be negated by having to discard an optional timezone.


Solution

  • The XSD (XML Schema) specification allows an xs:date to have a timezone. Theoretically it makes sense; I remember someone saying they had two grandsons, one born on 30 Jan in Australia and the other on 29 Jan in London, and the Australian child was a couple of hours older than the British one. In practice though it is very rarely used.

    The XPath 2.0 function current-date() returns a date with a timezone offset, and the XML you are seeing may well be the result of serializing the output of that function without taking the trouble to remove the timezone offset first.

    I think you are right that ISO 8601 doesn't recognize this format.