I have a data set which is structured as follows
date1 header1 date2 header2 date3 header3 ...
11.12.23 100 11.12.23 90 08.12.23 95 ...
11.12.23 100 08.12.23 89 08.12.23 95 ...
08.12.23 95 08.12.23 89 07.12.23 93 ...
I want to drop duplicate values for each individual time series ([date1, header1]; [date2, header2], etc.). Then, I want to merge the different time series by dates as index. The resulting data set should be structures as follows
date header1 header2 header3 ...
11.12.23 100 90 NA ...
08.12.23 95 89 95 ...
07.12.23 NA NA 93 ...
Do you have any ideas how this can be achieved using python/pandas? Thanks.
Edit: I just recognized a mistake in the values for header3, which is now corrected.
Another possible solution, which uses list comprehension
, drop_duplicates
and pandas.concat
:
n = 3 # number of column pairs
pd.concat([df.iloc[:, i:(i+2):1].drop_duplicates()
.rename(columns={df.columns[i]: 'Date'}).set_index('Date')
for i in range(0, 2*n, 2)], axis=1).reset_index()
Output:
Date header1 header2 header3
0 11.12.23 100.0 90.0 NaN
1 08.12.23 95.0 89.0 95.0
2 07.12.23 NaN NaN 93.0