Search code examples
pythonpandasdataframesubtraction

NaNs when subtracting dataframes pandas


I have two dataframes with only somewhat overlapping indices and columns.

old = pd.DataFrame(index = ['A', 'B', 'C'],
                   columns = ['k', 'l', 'm'],
                   data = abs(np.floor(np.random.rand(3, 3)*10)))

new = pd.DataFrame(index = ['A', 'B', 'C', 'D'],
                   columns = ['k', 'l', 'm', 'n'],
                   data = abs(np.floor(np.random.rand(4, 4)*10)))

I want to calculate the difference between them and tried

delta = new - old

This gives lots of NaNs where indices and columns do not match. I would like to treat the abscence of the indices and columns as zeroes, (old['n', 'D'] = 0). old will always be a subspace of new.

Any ideas?

EDIT: I guess I didn't explain it thoroughly enough. I don't want to fill the delta dataframe with zeroes. I want to treat missing indices and columns in old as if they were zeroes. I would then get the value in new['n', 'D'] in delta instead of a NaN.


Solution

  • Use sub with fill_value=0:

    In [15]:
    old = pd.DataFrame(index = ['A', 'B', 'C'],
                       columns = ['k', 'l', 'm'],
                       data = abs(np.floor(np.random.rand(3, 3)*10)))
    ​
    new = pd.DataFrame(index = ['A', 'B', 'C', 'D'],
                       columns = ['k', 'l', 'm', 'n'],
                       data = abs(np.floor(np.random.rand(4, 4)*10)))
    delta = new.sub(old, fill_value=0)
    delta
    
    Out[15]:
       k  l  m  n
    A  0  3 -9  7
    B  0 -2  1  8
    C -4  1  1  7
    D  8  6  0  6