Search code examples
pythonpandasdataframereplacefillna

How to use the value of one column as part of a string to fill NaNs in another column?


Let's say I have the following df:

   year  date_until
1  2010  -
2  2011  30.06.13
3  2011  NaN
4  2015  30.06.18
5  2020  -

I'd like to fill all - and NaNs in the date_until column with 30/06/{year +1}. I tried the following but it uses the whole year column instead of the corresponding value of the specific row:

df['date_until] = df['date_until].str.replace('-', f'30/06/{df["year"]+1}')

my final goal is to calculate the difference between the year and the year of date_until, so maybe the step above is even unnecessary.


Solution

  • We can use pd.to_datetime here with errors='coerce' to ignore the faulty dates. Then use the dt.year to calculate the difference:

    df['date_until'] = pd.to_datetime(df['date_until'], format='%d.%m.%y', errors='coerce')
    df['diff_year'] = df['date_until'].dt.year - df['year']
    
       year date_until  diff_year
    0  2010        NaT        NaN
    1  2011 2013-06-30        2.0
    2  2011        NaT        NaN
    3  2015 2018-06-30        3.0
    4  2020        NaT        NaN