Search code examples
pythonpython-3.xdatetimedateparser

Why dateparser module fails to parse date strings in Spanish even setting parameters as languages or locales in Spanish?


I'm trying to parse a large set of files with records that include dates in Spanish with formats like this one 'Ago 01, 2022'. For this task, I'm using the function parse from dataparser module. In the past, I could use successfully that function for a similar purpose, but now it fails with string in Spanish even if I set languages or locales parameters for parse function.

I import the function parse with this line:

from dateparser import parse
  1. If I call the function with a date in English it run successfully, as I expect:
parse('Aug 01, 2021', date_formats=['%b %d, %Y'] )

# Returns
datetime.datetime(2022, 8, 1, 0, 0)
  1. If I call the function with a date in Spanish without any other parameter it runs unsuccessfully, as I expect too:

    (August in Spanish is Agosto):

parse('Ago 01, 2021', date_formats=['%b %d, %Y'] )

# Raises an exception in regex that ends with:

~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
   1735                 return False, [value]
   1736 
-> 1737         raise error("bad escape \\%s" % ch, source.string, source.pos)
   1738 
   1739     if isinstance(source.sep, bytes):

error: bad escape \d at position 7

I suppose that this error has something related to a regex pattern in Spanish, but I cannot be sure what is the problem beyond the language.

  1. Giving to parse a language parameter doesn't change the results.
parse('Ago 01, 2021', date_formats=['%b %d, %Y'], languages=['es'])

# Raises the same exception that ends with:

~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
   1735                 return False, [value]
   1736 
-> 1737         raise error("bad escape \\%s" % ch, source.string, source.pos)
   1738 
   1739     if isinstance(source.sep, bytes):

error: bad escape \d at position 7

  1. Something similar occurs if I set the parameter locales.
parse('Ago 01, 2021', date_formats=['%b %d, %Y'], locales=['es'])

# Raises the same exception that ends with:

~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
   1735                 return False, [value]
   1736 
-> 1737         raise error("bad escape \\%s" % ch, source.string, source.pos)
   1738 
   1739     if isinstance(source.sep, bytes):

error: bad escape \d at position 7


I'm not sure if the problem is related to an update or a change in the module, but I want to mention that when I call parse for the first time, I get this warning message.

~\anaconda3\lib\site-packages\dateparser\utils\__init__.py:130: PytzUsageWarning: The localize
method is no longer necessary, as this time zone supports the fold attribute (PEP 495). 
For more details on migrating to a PEP 495-compliant implementation, see 
https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
date_obj = tz.localize(date_obj)

Looking for an insight I tried to use a dateparser's demo located in this URL https://dateparser-demo.netlify.app/ cited in this github's repository https://github.com/scrapinghub/dateparser cited in PyPi https://pypi.org/project/dateparser/. But, in this demo, my string in Spanish is successfully parsed. I supposed that I have an old version of dateparser so I checked and I have the latest version available in PyPi.

  • I'm using python 3.7.3 and dateparser 1.1.1 (currently the latest) on a machine with Windows 10 in Spanish.

Solution

  • This has been fixed in recent versions.
    dateparser 1.1.3
    Can you check that everything is working as expected now?

    >>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'] )
    datetime.datetime(2021, 8, 1, 0, 0)
    >>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'], languages=['es'])
    datetime.datetime(2021, 8, 1, 0, 0)
    >>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'], locales=['es'])
    datetime.datetime(2021, 8, 1, 0, 0)