I'm making a function that detects if a value is a date (given a variety of different formats). dateutil.parser.parse is excellent for this but it is misidentifying some of my data as a date.
Specifically, it is correcting 'ad-3' to datetime.datetime(2019, 10, 3, 0, 0), which is the 3rd of the current month and year.
Based on the documentation for dateutil, the parse command takes a parseinfo argument.
I believe my issue is because the default parseinfo class that is passed to parse contains 'ad' in its jump property. I've tried to instantiate my own parseinfo class, modify the jump list property and pass it to parse, but the issue persists.
from dateutil import parser
def check_format(val, pi = parser.parserinfo()):
# Check date
try:
parser.parse(val, parserinfo=pi)
return 'date'
except ValueError:
return 'Not date'
default_pi = parser.parserinfo()
my_pi = parser.parserinfo()
my_pi.JUMP = [j for j in my_pi.JUMP if j not in ['ad']]
print('Default JUMP List:')
print(default_pi.JUMP) # Print the default JUMP list and you can se ad is part of the list
print('My Corrected JUMP List')
print(my_pi.JUMP) # Print the modified JUMP list and you see that we have successfully excluded
print('Return using default JUMP list:')
print(check_format('ad-3')) # Using default parserinfo
print('Return using my JUMP list:')
print(check_format('ad-3', my_pi)) # Using my parserinfo
print('Control test with a normal string:')
print(check_format('sad-3', my_pi))
check_format('ad-3', my_pi)
returns 'date' despite a parserinfo instance is passed that excludes 'ad' from its list.
As a control I've passed a similar string 'sad-3' and the output is as expected: 'Not date'.
Default JUMP List: [' ', '.', ',', ';', '-', '/', "'", 'at', 'on', 'and', 'ad', 'm', 't', 'of', 'st', 'nd', 'rd', 'th']
My Corrected JUMP List [' ', '.', ',', ';', '-', '/', "'", 'at', 'on', 'and', 'm', 't', 'of', 'st', 'nd', 'rd', 'th']
Return using default JUMP list: date
Return using my JUMP list: date
Control test with a normal string: Not date
I was googling something else but stumbled upon your question so I'll answer it.
PI
and others are class attributes. You just need to instantiate it all in the proper order.
from dateutil import parser
def check_format(val, pi):
# Check date
try:
parser.parse(val, parserinfo=pi())
return 'date'
except ValueError:
return 'Not date'
default_pi = parser.parserinfo
my_pi = parser.parserinfo
my_pi.JUMP = [j for j in my_pi.JUMP if j not in ['ad']]
print('Default JUMP List:')
print(default_pi.JUMP) # Print the default JUMP list and you can se ad is part of the list
print('My Corrected JUMP List')
print(my_pi.JUMP) # Print the modified JUMP list and you see that we have successfully excluded
print('Return using default JUMP list:')
print(check_format('ad-3', default_pi)) # Using default parserinfo
print('Return using my JUMP list:')
print(check_format('ad-3', my_pi)) # Using my parserinfo
print('Control test with a normal string:')
print(check_format('sad-3', my_pi))
check_format('ad-3', my_pi) returns 'date' despite a parserinfo instance is passed that excludes 'ad' from its list.
Now it returns Not date
.