Search code examples
pandasdataframemultiplication

Pandas Dataframe multiply with only the right dataframe taking fill_value


The fill_value argument of pandas.DataFrame.multiply() fills missing values in both dataframes. However, I'd like to have only missing values filled in the 2nd DataFrame. What would be a good way beyond my hacky solution below?

>>> df1 = pd.DataFrame({'a':[1, np.nan, 2], 'b':[np.nan, 3, 4]}, index = [1, 2, 3])
>>> df1
     a    b
1  1.0  NaN
2  NaN  3.0
3  2.0  4.0

>>> df2 = pd.DataFrame({'a':[2, np.nan], 'b':[3, np.nan], 'c':[1, 1]}, index = [1, 2])
>>> df2
     a    b    c
1  2.0  3.0  1.0
2  NaN  NaN  1.0

I would like to multiply the two DataFrames element-wise, by keeping df1 as the dominant one so that the resulting shape and NaN entries should match df1, while filling NaNs in df2 by value 1, to get

     a    b
1  2.0  NaN
2  NaN  3.0
3  2.0  4.0

The naive solution doesn't work:

>>> df1.multiply(df2, fill_value=1)
     a    b    c
1  2.0  3.0  1.0
2  NaN  3.0  1.0
3  2.0  4.0  NaN

My hacky solution is to create a matrix with 1s where df1 has value, and update by df2

>>> df3 = df1/df1
>>> df3.update(df2)
>>> df3
     a    b
1  2.0  3.0
2  NaN  1.0
3  1.0  1.0
>>> df1.multiply(df3)
     a    b
1  2.0  NaN
2  NaN  3.0
3  2.0  4.0

It just doesn't feel very elegant. Any cool idea on direct manipulation with df1 and df2, hopefully a one-liner?


Solution

  • You can use reindex and fillna on df2:

    df1.multiply(df2.reindex(df1.index).fillna(1))
    
         a    b
    1  2.0  NaN
    2  NaN  3.0
    3  2.0  4.0
    

    You don't need to explicitly call multiply in this case, and can just use * for multiplication:

    df1 * df2.reindex(df1.index).fillna(1)
    
         a    b
    1  2.0  NaN
    2  NaN  3.0
    3  2.0  4.0
    

    Additionally, if you need to align the columns of df2 with df1, use the columns parameter of reindex:

    df1 * df2.reindex(index=df1.index, columns=df1.columns).fillna(1)