Is there a way to do an Excel INDEX/MATCH within Python using two dataframes as inputs

Have googled and only relevant answers prescribe a merge which isn't applicable in my case.

I have two data frames:

da

2023-08-14 06:30:01 B   C   D   E   F   G
2023-08-14 06:30:01 B   C   D   E   F   G
2023-08-14 06:30:02 B   C   D   E   F   G
2023-08-14 06:30:03 B   C   D   E   F   G
2023-08-14 06:30:04 B   C   D   E   F   G
2023-08-14 06:30:05 B   C   D   E   F   G
2023-08-14 06:30:06 A   B   C   E   F   G
2023-08-14 06:30:07 A   B   C   E   F   G

db

2023-08-14 06:30:01 28  26  8   -7  -17 -14
2023-08-14 06:30:01 28  26  8   -7  -17 -14
2023-08-14 06:30:02 28  26  8   -5  -17 -14
2023-08-14 06:30:03 28  26  5   -5  -17 -14
2023-08-14 06:30:04 28  26  5   -11 -17 -14
2023-08-14 06:30:05 28  26  5   -11 -17 -10
2023-08-14 06:30:06 33  28  26  -11 -17 -10
2023-08-14 06:30:07 34  28  26  -11 -17 -10

I'd like to return a combination of the two using a unique list of values from da, in order, as columns, and match column and time to return the corresponding value from db as value in resulting dataframe like the below:

dc

                    A   B   C   D   E   F   G
2023-08-14 06:30:01 0   28  26  8   -7  -17 -14
2023-08-14 06:30:01 0   28  26  8   -7  -17 -14
2023-08-14 06:30:02 0   28  26  8   -5  -17 -14
2023-08-14 06:30:03 0   28  26  5   -5  -17 -14
2023-08-14 06:30:04 0   28  26  5   -11 -17 -14
2023-08-14 06:30:05 0   28  26  5   -11 -17 -10
2023-08-14 06:30:06 33  28  26  0   -11 -17 -10
2023-08-14 06:30:07 34  28  26  0   -11 -17 -10

There's a one-to-one correspondence between da and db (ie, same number of rows and columns), so could do this row by row but would prefer a solution that doesn't involve iteration as the results aren't path dependent in any way.

I'm able to create column headers for dc via a map/set:

from itertools import chain

a = list(map(set,da.values.T))
b = list(set(chain.from_iterable(a)))
dc = pd.DataFrame(columns = b)

but how do I populate the resulting dataframe per the logic above?

Solution

Here's one approach:

Minimal Reproducible Example

import pandas as pd
import numpy as np

idx = pd.date_range('2024-01-01', periods=2)

dfa = pd.DataFrame([[*'BCD'], [*'ABD']], index=idx)
dfb = pd.DataFrame([[1, 2, 3]]*2, index=idx)

# dfa

            0  1  2
2024-01-01  B  C  D
2024-01-02  A  B  D

# dfb

            0  1  2
2024-01-01  1  2  3
2024-01-02  1  2  3

Code

cols, col_indices = np.unique(dfa.values, return_inverse=True)

n_rows, n_cols = dfb.shape

data = np.zeros((n_rows, len(cols)), dtype=dfb.values.dtype)

row_indices = np.repeat(np.arange(n_rows), n_cols)
data[row_indices, col_indices] = dfb.values.ravel()

dfc = pd.DataFrame(data, columns=cols, index=dfb.index)

Output:

            A  B  C  D
2024-01-01  0  1  2  3
2024-01-02  1  2  0  3

Explanation / intermediates

Use np.unique to get sorted unique values for dfa as columns + flat col_indices via return_inverse=True.

np.unique(dfa.values, return_inverse=True)

(array(['A', 'B', 'C', 'D'], dtype=object),
 array([1, 2, 3, 0, 1, 3], dtype=int64))

Get n_rows and n_cols via df.shape.
Use n_rows and len(cols) inside np.zeros to set up the array for our target, dfc.

data

array([[0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int64)

Now, create row_indices using np.repeat on a range of length n_rows with n_cols as repeats.

row_indices

array([0, 0, 0, 1, 1, 1])

Use df.values + np.ravel to get dfb as a flattened array and assign to data based on our indices.
Finally, use pd.DataFrame to create dfc, passing the original index.