Search code examples
pythondatetimeunicodeasciiiso8601

Should hypen-minus (U+002D) or hypen (U+2010) be used for ISO 8601 datetimes?


Python interpreter gives the following when generating an ISO-8601 formatted date/time string:

>>> import datetime
>>> datetime.datetime.now().isoformat(timespec='seconds')
'2023-10-12T22:35:02'

Note that the '-' character in the string is a hypen-minus character. When going backwards to produce the datetime object, we do the following:

>>> datetime.datetime.strptime('2023-10-12T22:35:02', '%Y-%m-%dT%H:%M:%S')
datetime.datetime(2023, 10, 12, 22, 35, 2)

This all checks out.

However, sometimes when the ISO-8601 formatted date/time string is provided from an external source, such as a parameter sent over in a GET/POST request, or in a .csv file, the hyphens are sent as the (U+2010) character, which causes the parsing to break:

>>> datetime.datetime.strptime('2023‐10‐12T22:35:02', '%Y-%m-%dT%H:%M:%S')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2023‐10‐12T22:35:02' does not match format '%Y-%m-%dT%H:%M:%S'

What is the correct standard? Is it hypen-minus - U+002D as given by Python when converting via .isoformat(), or hypen U+2010?

Would it be best practice to accept both?


Solution

  • The ISO 8601 standard is not publicly available for free. Perhaps someone who has a copy can post a more definitive answer.

    ISO has published a brief summary of the ISO 8601 standard. The summary consistently uses HYPHEN-MINUS (0x2D). (Thanks to Giacomo Catenazzi for pointing this out in a comment.)

    RFC 3339 is based on ISO 8601, and it consistently uses the HYPHEN-MINUS character (0x2D), not the Unicode HYPHEN character (0x2010). Note that using HYPHEN-MINUS, which is an ASCII character, avoids issues with differing character sets.

    Reference: https://datatracker.ietf.org/doc/html/rfc3339

    If you create timestamps intended to be consistent with ISO 8601, you should definitely use HYPHEN-MINUS.

    If you receive timestamps that are supposedly intended to be ISO 8601, but they include HYPHEN (0x2010) characters, you can choose to accept them. Whether you should accept them depends on the requirements of your project. If possible, ask whoever is generating timestamps to use the correct HYPHEN-MINUS characters. Once you start accepting non-standard input, you might have to do an open-ended amount of work.