Let's say I have the following df:
year date_until
1 2010 -
2 2011 30.06.13
3 2011 NaN
4 2015 30.06.18
5 2020 -
I'd like to fill all -
and NaN
s in the date_until
column with 30/06/{year +1}. I tried the following but it uses the whole year column instead of the corresponding value of the specific row:
df['date_until] = df['date_until].str.replace('-', f'30/06/{df["year"]+1}')
my final goal is to calculate the difference between the year and the year of date_until, so maybe the step above is even unnecessary.
We can use pd.to_datetime
here with errors='coerce'
to ignore the faulty dates. Then use the dt.year
to calculate the difference:
df['date_until'] = pd.to_datetime(df['date_until'], format='%d.%m.%y', errors='coerce')
df['diff_year'] = df['date_until'].dt.year - df['year']
year date_until diff_year
0 2010 NaT NaN
1 2011 2013-06-30 2.0
2 2011 NaT NaN
3 2015 2018-06-30 3.0
4 2020 NaT NaN