Search code examples
pythondatetimestrftime

Identifying date values in a text file that do not have zero padding for month and day using python strftime()


I am reading a text file in which one of the columns (3rd) contains date values in different date formats along with other data types (string, number, etc). I am identifying each unique date format in my input file and outputting the values for each date format to a text file. One of the date formats I am looking for in my input file is - Month dd, YYYY.

For e.g. May 1, 2001.

I am using the python strftime() function to search for the above date format in my file. However, my code provided below is only outputting date values with zero-padding.

For e.g. May 01, 2001

But, I need my program to also find date values without the zero-padding that I know exist in my input file such as:

May 1, 2001

2/4/2012

I have searched the forum for the solution, and most of it involved removing/replacing the zero. However, this does not solve my problem as I am searching the file for both zero padded and non zero-padded date values.
I also tried adding the "-" sign after the "%" sign in my date format, but it did not work. I have provided my code below. Can anyone guide me in the right direction ? I am using Pycharm CE 2017.2 with the python 3.6.3 interpreter.

My Code:

# Identifying Month dd, yyyy date format

from datetime import datetime

import csv

with open('file1.csv', 'r') as f_input, open('file2.txt', 'w') as f_output:

    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for row in csv_input:
        for date_format in ['%B %d, %Y']:
            try:
                converted = datetime.strptime(row[3], date_format)
                csv_output.writerow([row[0], row[1], row[2], converted.strftime(date_format), row[4]])
            except ValueError:
                pass

Solution

  • It is very hard to do what you want reliably using datetime.strptime(). I can see from your code that you are already planning to have multiple date format strings (otherwise your date format string wouldn't be in a 1-list).

    Try this approach instead:

    >>> from dateutil import parser
    >>> parser.parse("May 1, 2001")
    datetime.datetime(2001, 5, 1, 0, 0)
    >>> parser.parse("May 01, 2001")
    datetime.datetime(2001, 5, 1, 0, 0)
    >>> parser.parse("2/4/2012")
    datetime.datetime(2012, 2, 4, 0, 0)
    >>> parser.parse("02/04/2012")
    datetime.datetime(2012, 2, 4, 0, 0)
    

    So, instead of

    for row in csv_input:
        for date_format in ['%B %d, %Y']:
            try:
                converted = datetime.strptime(row[3], date_format)
                csv_output.writerow([row[0], row[1], row[2], converted.strftime(date_format), row[4]])
            except ValueError:
                pass
    

    do

    for row in csv_input:
        converted = parser.parse(row[3])
        csv_output.writerow([row[0], row[1], row[2], converted.strftime('%B %d, %Y'), row[4]])
    

    This will convert all of your input dates to a standard format, for example "May 1, 2001". I've omitted your try...except because that is there to catch the cases where strptime() fails, and parser.parse() will do a competent job of whatever you pass it, as long as it isn't complete nonsense, and in that case you should not want to silently suppress the exception.

    dateutil isn't in the standard library but it's well worth the effort of installing.