Search code examples
pythonpython-3.xpandasdataframemultiplication

Pandas: rowwise multiplication of two dataframes


I have two dataframes; A contains allocation fractions and B contains hourly volumes. To get the right volume for each bus for a given hour, I need to multiply A with each row of dataframe B. For a given hour (x), this would be a simple multiplication of A * B.loc[x].

A =       col_a  col_b  col_c   col_d   col_e               
     0    0.0    0.0    0.0     0.0     1.0
     1    0.0    0.0    1.0     0.0     0.0
     2    0.0    1.0    0.0     0.5     0.0
     3    0.5    0.0    0.0     0.5     0.0
     4    0.5    0.0    0.0     0.0     0.0
B =     col_a   col_b   col_c   col_d   col_e
    0   12881   598     154     180     0.0 
    1   12881   680     154     180     0.0
    2   11617   806     154     180     0.0
    3   12991   100     154     180     0.0

However, I want to do this multiplication for each hour at once and create a large multiindex dataframe C.

C =          col_a   col_b  col_c  col_d  col_e
    hr  bus                     
    0   0    0.0     0.0    0.0    0.0   0.0
        1    0.0     0.0    154.0  0.0   0.0
        2    0.0     598.0  0.0    90.0  0.0
        3    6440.5  0.0    0.0    90.0  0.0
        4    6440.5  0.0    0.0    0.0   0.0
    1   0    0.0     0.0    0.0    0.0   0.0
        1    0.0     0.0    154.0  0.0   0.0
        2    0.0     680.0  0.0    90.0  0.0

I managed to create this dataframe with a list operation and overwriting the index of the resulting dataframe. I would not consider this a very good practice and wonder if there's a better approach that doesn't require an overwrite of the index?

dfs = [A.mul(B.loc[i]) for i in B.index]
C = pandas.concat(dfs)

C.index = pandas.MultiIndex.from_product([B.index, A.index], names=['hr', 'bus'])

Solution

  • First "replicate" B DataFrame, reformatting the index:

    BB = pd.DataFrame(np.repeat(B.values, A.index.size, axis=0), columns=B.columns,
        index=pd.MultiIndex.from_product((B.index, A.index), names=['hr', 'bus']))
    

    Then compute the result:

    result = A.mul(BB, level=1)
    

    The result is:

             col_a  col_b  col_c  col_d  col_e
    hr bus                                    
    0  0       0.0    0.0    0.0    0.0    0.0
       1       0.0    0.0  154.0    0.0    0.0
       2       0.0  598.0    0.0   90.0    0.0
       3    6440.5    0.0    0.0   90.0    0.0
       4    6440.5    0.0    0.0    0.0    0.0
    1  0       0.0    0.0    0.0    0.0    0.0
       1       0.0    0.0  154.0    0.0    0.0
       2       0.0  680.0    0.0   90.0    0.0
       3    6440.5    0.0    0.0   90.0    0.0
       4    6440.5    0.0    0.0    0.0    0.0
    2  0       0.0    0.0    0.0    0.0    0.0
       1       0.0    0.0  154.0    0.0    0.0
       2       0.0  806.0    0.0   90.0    0.0
       3    5808.5    0.0    0.0   90.0    0.0
       4    5808.5    0.0    0.0    0.0    0.0
    3  0       0.0    0.0    0.0    0.0    0.0
       1       0.0    0.0  154.0    0.0    0.0
       2       0.0  100.0    0.0   90.0    0.0
       3    6495.5    0.0    0.0   90.0    0.0
       4    6495.5    0.0    0.0    0.0    0.0