I am using python in scrapy and collecting a bunch of dates that are stored on a web page in the form of text strings like "11th November" (no year is provided).
I was trying to use
startdate = '11th November'
datetime.strptime(startdate, '%d %B')
but I don't think it likes the 'th' and I get a
Value error: time data '11th November' does not match format '%d %B'
If I make a function to try to strip out the th, st, rd, nd from the days I figured it will strip out the same text from the month.
Is there a better way to approach turning this into a date format?
For my use, it ultimately needs to be in the ISO 8601 format YYYY-MM-DD
This is so that I can pipe it from scrapy to a database, and from that use it in a Google Spreadsheet for a javascript Google chart. I just mention this because there may be a better place to make the string-to-date change than trying to do it in python.
(As a secondary issue, I also need to figure how to add the right year to the date given that if it says 12th January that would mean Jan 2020 and not 2019. This will be based on a comparison to the date when the scrape runs. i.e. the date today.)
EDIT: it turned out that the solution required the secondary issue to be addressed as well. Hence the choice of final answer to this question. If the secondary issue of the year was not addressed it defaulted to 1900 which was a problem.
Try this out -
import datetime
datetime_obj = datetime.datetime.strptime(re.sub(r"\b([0123]?[0-9])(st|th|nd|rd)\b",r"\1", startdate) + " " + str(datetime.datetime.now().year), "%d %B %Y")