Say I have a list of dates that are formatted in one of the five ways below:
Date_Type_1 = 2001 Apr 15
Date_Type_2 = 2001 Apr
Date_Type_3 = 2000 Spring
Date_Type_4 = 2000 Nov-Dec
Date_Type_5 = 2001
So that my date list will look like this.
Date_list = ["2001 Apr 15", "2004 May 15", "2011 Jan", "2011 Fall", "2000 Nov-Dec", "2012", "2000 Spring" ]
I now want to try and format these dates so they can be passed to a MySQL database. The only way I know how is to use a lot of condition flow. Here is an example of how I would do this. I will not include ALL conditions as it would take up too much space.
for i in Date_list:
year = i[:4]
#This will be my conditional logic to define a month.
#This is only an example. Will not include all 12 months, 4 seasons, and various bi-months
if "Apr" in i and "Mar-Apr" not in i:
month = 4
if "Mar-Apr" in i:
month = 3
if "May" in i and "May-Jun" not in i:
month = 5
if "Apr-May" in i:
month = 4
if "Spring" in i:
month = 3
#This will be conditional logic to define the day.
#I would do this for each of the 31 days of the month.
if "15" in i and "2015" not in i:
day = 15
date_return = datetime.datetime(year,month,day)
date_format = date_return.date().isoformat
The issue with this is that I am making a number of assumptions. I am okay with defining seasons "Spring/Summer.." and the bi-monthly (e.g. Mar/Apr) returns as specific months. The issue, at least with defining days is it will not catch days if:
test_list = [2011 May, 2015 Apr 15]
for i in test_list:
if "15" in i and "2015" not in i:
day = 15
This will not catch the day. I am wondering if there is a more efficient way to do this? This current method would require 50+ conditional statements to define the day/months.
You should use the Python regular expression module re
. It's a lot better for this than trying to mess with slices and in
.
import re
MONTHS = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',
'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
SEASONS = { 'Spring': 'Mar', 'Summer': 'Jun', 'Fall': 'Sep',
'Autumn': 'Sep', 'Winter': 'Dec' }
_MONTHS_RE = '|'.join(MONTHS)
_SEASONS_RE = '|'.join(SEASONS)
DATE_RE = re.compile(r"""(?ax) # ASCII-only verbose mode
(?P<year>20[0-9]{2}) # Year
( # followed by either...
(?P<month>""" + _MONTHS_RE + r""") # a month name then...
( -(?P<endmonth>""" + _MONTHS_RE + r""") # a month range
| (?P<day>[1-9][0-9]?) # a day number
)? # range and day are optional
| (?P<season>""" + '|'.join(SEASONS) + r""") # or a season.
)""")
def parse_date(datestr):
m = DATE_RE.match(datestr)
if m is None:
return # Didn't match
md = m.groupdict()
year = int(md["year"])
if "season" in md:
month = SEASONS[md["season"]]
else:
month = md["month"]
if "endmonth" in md:
# handle month range here.
day = int(md.get("month", 1))
return year, month, day
# Month is returned as a string; to get a number, use:
return year, MONTHS.index(month) + 1, day
Note that this doesn't ensure that the date exists; it will accept "2099 Jun 50"
or many other bad dates. But I'll leave filtering that as an exercise for the reader.