Search code examples
pythonpandasdatetimestrftime

Convert to_datetime when days don't contain leading zero


I'm trying to get the index of my dataFrame to be of type datetime. My CSV file contains seperate columns of Dates and Times which i combine upon importing:

df = pd.read_csv("example.csv", sep=";", decimal=",", parse_dates=[["Date", "Time"]])

It will look like this after the import:

Date_Time
0 1012020 00:00:00
1 1012020 00:15:00

The problem is the missing leading zero on the first 9 days of each month. Pandas to_datetime() needs a leading zero for the %d format option to work. When i use format="%d%m%Y%H:%M:%S" python says "invalid syntax"

How can I convert this column to datetime?


Solution

  • Use Series.str.zfill (as suggested by @FObersteiner in the comments) and apply pd.to_datetime afterwards:

    import pandas as pd
    
    # changing 2nd val to `'12012020 00:15:00'` to show that 
    # only the 1st val is affected
    data = {'Date_Time': {0: '1012020 00:00:00', 1: '12012020 00:15:00'}}
    
    df = pd.DataFrame(data)
    
    df['Date_Time'] = pd.to_datetime(df["Date_Time"].str.zfill(17), 
                                     format="%d%m%Y %H:%M:%S")
    
    print(df)
    
                Date_Time
    0 2020-01-01 00:00:00
    1 2020-01-12 00:15:00
    
    print(df['Date_Time'].dtype)
    datetime64[ns]
    

    Another (admittedly, unnecessarily complicated) way to go, would be to use a regex pattern to replace all "dates" with 7 digits by their 8-digit equivalent:

    df['Date_Time'] = pd.to_datetime(
        df['Date_Time'].replace(r'^(\d{7}\s)',r'0\1', regex=True),
        format="%d%m%Y %H:%M:%S")
    

    Explanation r'^(\d{7}\s)':

    • ^ assert position at start of the string
    • \d{7}\s matches 7 digits followed by a whitespace
    • The encapsulating brackets turn this into a Capturing Group

    Explanation r'0\1':

    • \1 refers back to the Capturing Group (1st of 1 group(s)), to which we prepend 0