Search code examples
pythondatetimedate-parsingpython-dateutil

dateutils default to the last occurence of recognized part, not next


I am using dateutils.parser.parse to parse date strings which might contain partial information. If some information is not present, parse can take a default keyword argument from which it will fill any missing fields. This default defaults to datetime.datetime.today().

For a case like dateutil.parser.parse("Thursday"), this means it will return the date of the next Thursday. However, I need it to return the date of the last Thursday (including today, if today happens to be a Thursday).

So, assuming today == datetime.datetime(2018, 2, 20) (a Tuesday), I would like to get all of these asserts to be true:

from dateutil import parser
from datetime import datetime

def parse(date_str, default=None):
    # this needs to be modified
    return parser.parse(date_str, default=default)

today = datetime(2018, 2, 20)

assert parse("Tuesday", default=today) == today    # True
assert parse("Thursday", default=today) == datetime(2018, 2, 15)    # False
assert parse("Jan 31", default=today) == datetime(2018, 1, 31)    # True
assert parse("December 10", default=today) == datetime(2017, 12, 10)    # False

Is there an easy way to achieve this? With the current parse function only the first and third assert would pass.


Solution

  • Here's your modified code (code.py):

    #!/usr/bin/env python3
    
    import sys
    from dateutil import parser
    from datetime import datetime, timedelta
    
    
    today = datetime(2018, 2, 20)
    
    data = [
        ("Tuesday", today, today),
        ("Thursday", datetime(2018, 2, 15), today),
        ("Jan 31", datetime(2018, 1, 31), today),
        ("December 10", datetime(2017, 12, 10), today),
    ]
    
    
    def parse(date_str, default=None):
        # this needs to be modified
        return parser.parse(date_str, default=default)
    
    
    def _days_in_year(year):
        try:
            datetime(year, 2, 29)
        except ValueError:
            return 365
        return 366
    
    
    def parse2(date_str, default=None):
        dt = parser.parse(date_str, default=default)
        if default is not None:
            weekday_strs = [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
            if date_str.lower() in weekday_strs:
                if dt.weekday() > default.weekday():
                    dt -= timedelta(days=7)
            else:
                if (dt.month > today.month) or ((dt.month == today.month) and (dt.day > today.day)):
                    dt -= timedelta(days=_days_in_year(dt.year))
        return dt
    
    
    def print_stats(parse_func):
        print("\nPrinting stats for \"{:s}\"".format(parse_func.__name__))
        for triple in data:
            d = parse_func(triple[0], default=triple[2])
            print("  [{:s}] [{:s}] [{:s}] [{:s}]".format(triple[0], str(d), str(triple[1]), "True" if d == triple[1] else "False"))
    
    
    if __name__ == "__main__":
        print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
        print_stats(parse)
        print_stats(parse2)
    

    Notes:

    • I changed the structure of the code "a bit", to parametrize it, so if a change is needed (e.g. a new example to be added) the changes should be minimal
      • Instead of asserts, I added a function (print_stats) that prints the results (instead raising AssertError and exiting the program if things don't match)
        • Takes an argument (parse_func) which is a function that does the parsing (e.g. parse)
        • Uses some globally declared data (data) together with the (above) function
      • data - is a list of triples, where each triple contains:
        1. Text to be converted
        2. Expected datetime ([Python 3.Docs]: datetime Objects) to be yielded by the conversion
        3. default argument to be passed to the parsing function (parse_func)
    • parse2 function (an improved version of parse):

      • Accepts 2 types of date strings:
        1. Weekday name
        2. Month / Day (unordered)
      • Does the regular parsing, and if the converted object comes after the one passed as the default argument (that is determined by comparing the appropriate attributes of the 2 objects), it subtracts a period (take a look at [Python 3.Docs]: timedelta Objects):
        1. "Thursday" comes after "Tuesday", so it subtracts the number of days in a week (7)
        2. "December 10" comes after "February 20", so it subtracts the number of days in the year*
      • weekday_strs: I'd better explain it by example:

        >>> parser.parserinfo.WEEKDAYS
        [('Mon', 'Monday'), ('Tue', 'Tuesday'), ('Wed', 'Wednesday'), ('Thu', 'Thursday'), ('Fri', 'Friday'), ('Sat', 'Saturday'), ('Sun', 'Sunday')]
        >>> [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
        ['mon', 'monday', 'tue', 'tuesday', 'wed', 'wednesday', 'thu', 'thursday', 'fri', 'friday', 'sat', 'saturday', 'sun', 'sunday']
        
        • Flattens parser.parserinfo.WEEKDAYS
        • Converts strings to lowercase (for simplifying comparisons)
    • _days_in_year* - as you probably guessed, returns the number of days in an year (couldn't simply subtract 365 because leap years might mess things up):
      >>> dt = datetime(2018, 3, 1)
      >>> dt
      datetime.datetime(2018, 3, 1, 0, 0)
      >>> dt - timedelta(365)
      datetime.datetime(2017, 3, 1, 0, 0)
      >>> dt = datetime(2016, 3, 1)
      >>> dt
      datetime.datetime(2016, 3, 1, 0, 0)
      >>> dt - timedelta(365)
      datetime.datetime(2015, 3, 2, 0, 0)
      

    Output:

    (py35x64_test) E:\Work\Dev\StackOverflow\q048884480>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
    Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
    
    
    Printing stats for "parse"
      [Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
      [Thursday] [2018-02-22 00:00:00] [2018-02-15 00:00:00] [False]
      [Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
      [December 10] [2018-12-10 00:00:00] [2017-12-10 00:00:00] [False]
    
    Printing stats for "parse2"
      [Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
      [Thursday] [2018-02-15 00:00:00] [2018-02-15 00:00:00] [True]
      [Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
      [December 10] [2017-12-10 00:00:00] [2017-12-10 00:00:00] [True]