Search code examples
pythonpython-3.xdate-formatmysql-connector-python

More Effecient Way to Format Dates for in Python for MySQL


Say I have a list of dates that are formatted in one of the five ways below:

Date_Type_1 = 2001 Apr 15
Date_Type_2 = 2001 Apr
Date_Type_3 = 2000 Spring
Date_Type_4 = 2000 Nov-Dec
Date_Type_5 = 2001

So that my date list will look like this.

Date_list = ["2001 Apr 15", "2004 May 15", "2011 Jan", "2011 Fall", "2000 Nov-Dec", "2012", "2000 Spring" ]

I now want to try and format these dates so they can be passed to a MySQL database. The only way I know how is to use a lot of condition flow. Here is an example of how I would do this. I will not include ALL conditions as it would take up too much space.

for i in Date_list:
  year = i[:4]
  #This will be my conditional logic to define a month.
  #This is only an example.  Will not include all 12 months, 4 seasons, and various bi-months
  if "Apr" in i and "Mar-Apr" not in i:
     month = 4
  if "Mar-Apr" in i:
     month = 3
  if "May" in i and "May-Jun" not in i:
     month = 5
  if "Apr-May" in i:
     month = 4
  if "Spring" in i:
     month = 3
  #This will be conditional logic to define the day.
  #I would do this for each of the 31 days of the month.
  if "15" in i and "2015" not in i:
     day = 15

 date_return = datetime.datetime(year,month,day)
 date_format = date_return.date().isoformat

The issue with this is that I am making a number of assumptions. I am okay with defining seasons "Spring/Summer.." and the bi-monthly (e.g. Mar/Apr) returns as specific months. The issue, at least with defining days is it will not catch days if:

test_list = [2011 May, 2015 Apr 15]
for i in test_list:
  if "15" in i and "2015" not in i:
    day = 15

This will not catch the day. I am wondering if there is a more efficient way to do this? This current method would require 50+ conditional statements to define the day/months.


Solution

  • You should use the Python regular expression module re. It's a lot better for this than trying to mess with slices and in.

    import re
    
    MONTHS = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 
        'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    SEASONS = { 'Spring': 'Mar', 'Summer': 'Jun', 'Fall': 'Sep', 
        'Autumn': 'Sep', 'Winter': 'Dec' }
    
    _MONTHS_RE = '|'.join(MONTHS)
    _SEASONS_RE = '|'.join(SEASONS)
    DATE_RE = re.compile(r"""(?ax) # ASCII-only verbose mode
        (?P<year>20[0-9]{2}) # Year
        ( # followed by either...
            (?P<month>""" + _MONTHS_RE + r""") # a month name then...
            (   -(?P<endmonth>""" + _MONTHS_RE + r""") # a month range
            |   (?P<day>[1-9][0-9]?) # a day number
            )? # range and day are optional 
        |   (?P<season>""" + '|'.join(SEASONS) + r""") # or a season.
        )""")
    
    def parse_date(datestr):
        m = DATE_RE.match(datestr)
        if m is None:
            return # Didn't match
        md = m.groupdict()
        year = int(md["year"])
        if "season" in md:
            month = SEASONS[md["season"]]
        else:
            month = md["month"]
            if "endmonth" in md:
                # handle month range here.
            day = int(md.get("month", 1))
        return year, month, day
        # Month is returned as a string; to get a number, use:
        return year, MONTHS.index(month) + 1, day
    

    Note that this doesn't ensure that the date exists; it will accept "2099 Jun 50" or many other bad dates. But I'll leave filtering that as an exercise for the reader.