Search code examples
pythonpandasdataframesumproduct

sumproduct 2 dataframes with nan values


Have 2 dataframes df['x'] and df['y']. How do i multiply the 2 tables and sum the rows?

x0  x1  x2      y0  y1  y2
2   5   9       2   3   4
Nan 4   7       2   2   3
3   Nan 3       1   2   3

Desired output:
(2x2)+(5x3)+(9x4) = 55
etc..

Nan should be treated as 0.
I keep hitting results with nan values with the below code.
Any other more efficient ways to solve this?

df['z'] = (df['x'].to_numpy() * df['y'].to_numpy()).sum(axis=1)

Solution

  • Pandas supports arithmetic operations over dataframe, and also provides flexible wrappers for them.

     df.method()   operator
    ------------------------
         add          +
         sub          -
         mul          *
         div          /
         mod          %
         pow          **
    

    Simply multiply the original dataframe with nan values filled.

    #METHOD 1
    (df_x.fillna(0)*df_y.values).sum(1)
    

    Or you can use pd.DataFrame.mul() which is equivalent to * but provides the functionality of adding fill_value

    #METHOD 2
    df_x.mul(df_y.values, fill_value=0).sum(1)
    
    0    55.0
    1    29.0
    2    12.0
    dtype: float64
    

    Note: If df_y as nan values as well, then replace df_y.values with df_y.fillna(0).values.