I am reading a text file in which one of the columns (3rd) contains date values in different date formats along with other data types (string, number, etc). I am identifying each unique date format in my input file and outputting the values for each date format to a text file. One of the date formats I am looking for in my input file is - Month dd, YYYY.
For e.g. May 1, 2001.
I am using the python strftime() function to search for the above date format in my file. However, my code provided below is only outputting date values with zero-padding.
For e.g. May 01, 2001
But, I need my program to also find date values without the zero-padding that I know exist in my input file such as:
May 1, 2001
2/4/2012
I have searched the forum for the solution, and most of it involved removing/replacing the zero. However, this does not solve my problem as I am searching the file for both zero padded and non zero-padded date values.
I also tried adding the "-" sign after the "%" sign in my date format, but it did not work. I have provided my code below. Can anyone guide me in the right direction ? I am using Pycharm CE 2017.2 with the python 3.6.3 interpreter.
My Code:
# Identifying Month dd, yyyy date format
from datetime import datetime
import csv
with open('file1.csv', 'r') as f_input, open('file2.txt', 'w') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
for date_format in ['%B %d, %Y']:
try:
converted = datetime.strptime(row[3], date_format)
csv_output.writerow([row[0], row[1], row[2], converted.strftime(date_format), row[4]])
except ValueError:
pass
It is very hard to do what you want reliably using datetime.strptime()
. I can see from your code that you are already planning to have multiple date format strings (otherwise your date format string wouldn't be in a 1-list).
Try this approach instead:
>>> from dateutil import parser
>>> parser.parse("May 1, 2001")
datetime.datetime(2001, 5, 1, 0, 0)
>>> parser.parse("May 01, 2001")
datetime.datetime(2001, 5, 1, 0, 0)
>>> parser.parse("2/4/2012")
datetime.datetime(2012, 2, 4, 0, 0)
>>> parser.parse("02/04/2012")
datetime.datetime(2012, 2, 4, 0, 0)
So, instead of
for row in csv_input:
for date_format in ['%B %d, %Y']:
try:
converted = datetime.strptime(row[3], date_format)
csv_output.writerow([row[0], row[1], row[2], converted.strftime(date_format), row[4]])
except ValueError:
pass
do
for row in csv_input:
converted = parser.parse(row[3])
csv_output.writerow([row[0], row[1], row[2], converted.strftime('%B %d, %Y'), row[4]])
This will convert all of your input dates to a standard format, for example "May 1, 2001". I've omitted your try...except
because that is there to catch the cases where strptime()
fails, and parser.parse()
will do a competent job of whatever you pass it, as long as it isn't complete nonsense, and in that case you should not want to silently suppress the exception.
dateutil
isn't in the standard library but it's well worth the effort of installing.