Search code examples
pythonpandasdataframesubtraction

How to subtract one row pandas series from a multi-row pandas series?


I have a pandas dataframe with 962 columns

print(df)
         ID  doy  WaitID  Year  ...       212
386    1895  193   14507  2001  ...  0.407672
389    1899  192   14511  2001  ...  0.000000
390    1900  204   14512  2001  ...  0.000000
391    1902  145   14514  2001  ...  2.251606
395    1877  204   14491  2001  ...  1.727977
...     ...  ...     ...   ...  ...       ...
20279   369  189   32767  2001  ...  1.727977
20281   371  174   32767  2001  ...  2.038362
20292   356  170   32767  2001  ...  0.407672
20295   359  174   32767  2001  ...  0.815345
20296   360  201   32767  2001  ...  2.038362

and another data frame that I have sliced into a average one-rowed pandas series

print(mean_df)
   ID     WaitID  ...        212        213    Year
0  3.1        3.0  ...  35.939027  24.231911  2000.0

I want to subtract the column 212 from the mean_df from the df column with the same name. I did the following but it gives me NaN:

x = df['212'].subtract(mean_df['212'], fill_value = 0)
print(x)

         ID  doy  WaitID  Year  ...       212  212_x
386    1895  193   14507  2001  ...  0.407672    NaN
389    1899  192   14511  2001  ...  0.000000    NaN
390    1900  204   14512  2001  ...  0.000000    NaN
391    1902  145   14514  2001  ...  2.251606    NaN
395    1877  204   14491  2001  ...  1.727977    NaN
...     ...  ...     ...   ...  ...       ...
20279   369  189   32767  2001  ...  1.727977    NaN
20281   371  174   32767  2001  ...  2.038362    NaN
20292   356  170   32767  2001  ...  0.407672    NaN
20295   359  174   32767  2001  ...  0.815345    NaN
20296   360  201   32767  2001  ...  2.038362    NaN

How can I subtract the one-rowed pandas series from multi-rowed pandas?


Solution

  • The issue is that you slice as Series, thus causing index alignement (here to index 0).

    You can squeeze your Series to scalar:

    df['212_x'] = df['212'].subtract(mean_df['212'].squeeze(), fill_value=0)
    

    The best would probably be to avoid using a DataFrame in the first place but rather a Series:

    mean = mean_df.squeeze()
    # or 
    mean = mean_df.iloc[0]
    
    # then
    df['212_x'] = df['212'].subtract(mean['212'], fill_value = 0)
    

    Output:

             ID  doy  WaitID  Year  ...       212      212_x
    386    1895  193   14507  2001  ...  0.407672 -35.531355
    389    1899  192   14511  2001  ...  0.000000 -35.939027
    390    1900  204   14512  2001  ...  0.000000 -35.939027
    391    1902  145   14514  2001  ...  2.251606 -33.687421
    395    1877  204   14491  2001  ...  1.727977 -34.211050
    20279   369  189   32767  2001  ...  1.727977 -34.211050
    20281   371  174   32767  2001  ...  2.038362 -33.900665
    20292   356  170   32767  2001  ...  0.407672 -35.531355
    20295   359  174   32767  2001  ...  0.815345 -35.123682
    20296   360  201   32767  2001  ...  2.038362 -33.900665