Search code examples
python-3.xpandasdatetimevalueerror

Handle ValueError while creating date in pd


I'm reading a csv file with p, day, month, and put it in a df. The goal is to create a date from day, month, current year, and I run into this error for 29th of Feb:

ValueError: cannot assemble the datetimes: day is out of range for month

I would like when this error occurs, to replace the day by the day before. How can we do that? Below are few lines of my pd and datex at the end is what I would like to get

        p  day month  year datex
0      p1  29    02  2021  28Feb-2021
1      p2  18    07  2021  18Jul-2021
2      p3  12    09  2021  12Sep-2021

Right now, my code for the date is only the below, so I have nan where the date doesn't exist.

df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')

Solution

  • You could try something like this :

    df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')
    

    Indeed, you get NA :

        p  day  year  month      datex
    0  p1   29  2021      2        NaT
    1  p2   18  2021      7 2021-07-18
    2  p3   12  2021      9 2021-09-12
    

    You could then make a particular case for these NA :

    df.loc[df.datex.isnull(), 'previous_day'] = df.day -1
    
        p  day  year  month      datex  previous_day
    0  p1   29  2021      2        NaT          28.0
    1  p2   18  2021      7 2021-07-18           NaN
    2  p3   12  2021      9 2021-09-12           NaN
    
    df.loc[df.datex.isnull(), 'datex'] = pd.to_datetime(df[['previous_day', 'year', 'month']].rename(columns={'previous_day': 'day'}))
    
        p  day  year  month      datex  previous_day
    0  p1   29  2021      2 2021-02-28          28.0
    1  p2   18  2021      7 2021-07-18           NaN
    2  p3   12  2021      9 2021-09-12           NaN
    

    You have to create a new day column if you want to keep day = 29 in the day column.