Search code examples
pythonpython-dateutil

Modify dateutil.parser.parse parameters to correct date misidentification


Dateutil parse misidentifies string as date, despite modification of parseinfo.JUMP property

I'm making a function that detects if a value is a date (given a variety of different formats). dateutil.parser.parse is excellent for this but it is misidentifying some of my data as a date.

Specifically, it is correcting 'ad-3' to datetime.datetime(2019, 10, 3, 0, 0), which is the 3rd of the current month and year.

I've tried modifying parseinfo to exclude 'ad'

Based on the documentation for dateutil, the parse command takes a parseinfo argument.

I believe my issue is because the default parseinfo class that is passed to parse contains 'ad' in its jump property. I've tried to instantiate my own parseinfo class, modify the jump list property and pass it to parse, but the issue persists.

from dateutil import parser

def check_format(val, pi = parser.parserinfo()):

    # Check date
    try:
        parser.parse(val, parserinfo=pi)
        return 'date'
    except ValueError:
        return 'Not date'

default_pi = parser.parserinfo()
my_pi = parser.parserinfo()
my_pi.JUMP = [j for j in my_pi.JUMP if j not in ['ad']]

print('Default JUMP List:')
print(default_pi.JUMP) # Print the default JUMP list and you can se ad is part of the list

print('My Corrected JUMP List')
print(my_pi.JUMP) # Print the modified JUMP list and you see that we have successfully excluded 

print('Return using default JUMP list:')
print(check_format('ad-3')) # Using default parserinfo

print('Return using my JUMP list:')
print(check_format('ad-3', my_pi)) # Using my parserinfo

print('Control test with a normal string:')
print(check_format('sad-3', my_pi))

Issue:

check_format('ad-3', my_pi) returns 'date' despite a parserinfo instance is passed that excludes 'ad' from its list.

As a control I've passed a similar string 'sad-3' and the output is as expected: 'Not date'.

Output:

Default JUMP List: [' ', '.', ',', ';', '-', '/', "'", 'at', 'on', 'and', 'ad', 'm', 't', 'of', 'st', 'nd', 'rd', 'th']

My Corrected JUMP List [' ', '.', ',', ';', '-', '/', "'", 'at', 'on', 'and', 'm', 't', 'of', 'st', 'nd', 'rd', 'th']

Return using default JUMP list: date

Return using my JUMP list: date

Control test with a normal string: Not date


Solution

  • I was googling something else but stumbled upon your question so I'll answer it.

    PI and others are class attributes. You just need to instantiate it all in the proper order.

    from dateutil import parser
    
    def check_format(val, pi):
    
        # Check date
        try:
            parser.parse(val, parserinfo=pi())
            return 'date'
        except ValueError:
            return 'Not date'
    
    default_pi = parser.parserinfo
    my_pi = parser.parserinfo
    my_pi.JUMP = [j for j in my_pi.JUMP if j not in ['ad']]
    
    print('Default JUMP List:')
    print(default_pi.JUMP) # Print the default JUMP list and you can se ad is part of the list
    
    print('My Corrected JUMP List')
    print(my_pi.JUMP) # Print the modified JUMP list and you see that we have successfully excluded
    
    print('Return using default JUMP list:')
    print(check_format('ad-3', default_pi)) # Using default parserinfo
    
    print('Return using my JUMP list:')
    print(check_format('ad-3', my_pi)) # Using my parserinfo
    
    print('Control test with a normal string:')
    print(check_format('sad-3', my_pi))
    

    check_format('ad-3', my_pi) returns 'date' despite a parserinfo instance is passed that excludes 'ad' from its list.

    Now it returns Not date.