I'm trying to parse a large set of files with records that include dates in Spanish with formats like this one 'Ago 01, 2022'. For this task, I'm using the function parse
from dataparser
module. In the past, I could use successfully that function for a similar purpose, but now it fails with string in Spanish even if I set languages or locales parameters for parse
function.
I import the function parse
with this line:
from dateparser import parse
parse('Aug 01, 2021', date_formats=['%b %d, %Y'] )
# Returns
datetime.datetime(2022, 8, 1, 0, 0)
If I call the function with a date in Spanish without any other parameter it runs unsuccessfully, as I expect too:
(August in Spanish is Agosto):
parse('Ago 01, 2021', date_formats=['%b %d, %Y'] )
# Raises an exception in regex that ends with:
~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
1735 return False, [value]
1736
-> 1737 raise error("bad escape \\%s" % ch, source.string, source.pos)
1738
1739 if isinstance(source.sep, bytes):
error: bad escape \d at position 7
I suppose that this error has something related to a regex pattern in Spanish, but I cannot be sure what is the problem beyond the language.
parse
a language parameter doesn't change the results.parse('Ago 01, 2021', date_formats=['%b %d, %Y'], languages=['es'])
# Raises the same exception that ends with:
~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
1735 return False, [value]
1736
-> 1737 raise error("bad escape \\%s" % ch, source.string, source.pos)
1738
1739 if isinstance(source.sep, bytes):
error: bad escape \d at position 7
parse('Ago 01, 2021', date_formats=['%b %d, %Y'], locales=['es'])
# Raises the same exception that ends with:
~\anaconda3\lib\site-packages\regex\_regex_core.py in _compile_replacement(source, pattern, is_unicode)
1735 return False, [value]
1736
-> 1737 raise error("bad escape \\%s" % ch, source.string, source.pos)
1738
1739 if isinstance(source.sep, bytes):
error: bad escape \d at position 7
I'm not sure if the problem is related to an update or a change in the module, but I want to mention that when I call parse
for the first time, I get this warning message.
~\anaconda3\lib\site-packages\dateparser\utils\__init__.py:130: PytzUsageWarning: The localize
method is no longer necessary, as this time zone supports the fold attribute (PEP 495).
For more details on migrating to a PEP 495-compliant implementation, see
https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
date_obj = tz.localize(date_obj)
Looking for an insight I tried to use a dateparser
's demo located in this URL https://dateparser-demo.netlify.app/ cited in this github's repository https://github.com/scrapinghub/dateparser cited in PyPi https://pypi.org/project/dateparser/. But, in this demo, my string in Spanish is successfully parsed. I supposed that I have an old version of dateparser so I checked and I have the latest version available in PyPi.
python
3.7.3 and dateparser
1.1.1 (currently the latest) on a machine with Windows 10 in Spanish.This has been fixed in recent versions.
dateparser 1.1.3
Can you check that everything is working as expected now?
>>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'] )
datetime.datetime(2021, 8, 1, 0, 0)
>>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'], languages=['es'])
datetime.datetime(2021, 8, 1, 0, 0)
>>> parse('Ago 01, 2021', date_formats=['%b %d, %Y'], locales=['es'])
datetime.datetime(2021, 8, 1, 0, 0)