Search code examples
pythonpandasjoinmergetime-series

Merge different time series containing duplicate values in multiple columns


I have a data set which is structured as follows

date1     header1  date2     header2  date3     header3  ... 
11.12.23  100      11.12.23  90       08.12.23  95       ...
11.12.23  100      08.12.23  89       08.12.23  95       ...
08.12.23  95       08.12.23  89       07.12.23  93       ...

I want to drop duplicate values for each individual time series ([date1, header1]; [date2, header2], etc.). Then, I want to merge the different time series by dates as index. The resulting data set should be structures as follows

date      header1  header2  header3  ... 
11.12.23  100      90       NA       ...
08.12.23  95       89       95       ...
07.12.23  NA       NA       93       ...     

Do you have any ideas how this can be achieved using python/pandas? Thanks.

Edit: I just recognized a mistake in the values for header3, which is now corrected.


Solution

  • Another possible solution, which uses list comprehension, drop_duplicates and pandas.concat:

    n = 3 # number of column pairs
    
    pd.concat([df.iloc[:, i:(i+2):1].drop_duplicates()
               .rename(columns={df.columns[i]: 'Date'}).set_index('Date')
               for i in range(0, 2*n, 2)], axis=1).reset_index()
    

    Output:

           Date  header1  header2  header3
    0  11.12.23    100.0     90.0      NaN
    1  08.12.23     95.0     89.0     95.0
    2  07.12.23      NaN      NaN     93.0