Search code examples
pandasdataframenumpytime-series

Fastest way to map the values of a timeseries dataframe from another timeseries dataframe


I have to timeseries dataframes:

from pandas import Timestamp

df1 = pd.DataFrame(index=[Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:02:00')], columns = ['Peter', 'Adam'], data =[['A', 'C'], ['C', 'D']])

df2 = pd.DataFrame(index=[Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:02:00'), Timestamp('2023-12-22 00:02:00'), Timestamp('2023-12-22 00:02:00'), Timestamp('2023-12-22 00:02:00')], columns=['Level', 'Score'], data=[['A', 1], ['B', 2], ['C', 3], ['D', 4], ['A', 5], ['B', 6], ['C', 7], ['D', 8]])
>>> df1
                    Peter Adam
2023-12-22 00:01:00     A    C
2023-12-22 00:01:00     C    D

>>> df2
                    Level  Score
2023-12-22 00:01:00     A      1
2023-12-22 00:01:00     B      2
2023-12-22 00:01:00     C      3
2023-12-22 00:01:00     D      4
2023-12-22 00:02:00     A      5
2023-12-22 00:02:00     B      6
2023-12-22 00:02:00     C      7
2023-12-22 00:02:00     D      8

I wish to get

                    Peter Adam
2023-12-22 00:01:00     1    3
2023-12-22 00:01:00     7    8

I can do using for loop, either go through by columns of df1 with apply or something

I feel there is a transform I could do to achieve the values very fast.

Anyone has any idea?


Solution

  • A possible option using Indexing-lookup :

    def lookup(df, name):
        idx, cols = pd.factorize(df[name])
        return df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
       
    inter = df1.join(df2.pivot(columns="Level", values="Score"))
    
    out = (inter.assign(**{n: inter.pipe(lookup, n)
             for n in df1.columns})[df1.columns])
    

    NB : Your MRE seems broken. The second index of df1 should be 2023-12-22 00:02:00.

    Output :

    print(out)
    
                         Peter  Adam
    2023-12-22 00:01:00      1     3
    2023-12-22 00:02:00      7     8
    
    [2 rows x 2 columns]