Search code examples
pythonpandasdataframesubtraction

How to subtract row values in reference to the associated columns in pandas


I would like to subtract consecutive values in rows in reference to their values in the other columns.

The data frame looks like this:

     A        B    
1   "Sat"     7  
2   "Sat"     9
3   "Sat"     5
4   "Sun"     8  
5   "Sun"     4
6   "Sun"     6

I would like to subtract consecutive values of the B column in reference to the A column and the output should be like this:

     A        B    
1   "Sat"     0  
2   "Sat"    +2
3   "Sat"    -4
4   "Sun"     0  
5   "Sun"    -4
6   "Sun"     2

I have been using df.diff() to get this done in the whole dataframe but here it is like (key, value) with key from A column and several values from the B column. I want to subtract values in B column as long as the key in A column is "Sat" and when it encounter the second key in A column which is "Sun" should restart subtraction.

Any help in solving this problem?


Solution

  • First if necessary sort_values by column A, then use groupby with diff, for replace NaN fillna:

    df['B'] = df.sort_values('A').groupby('A')['B'].diff().fillna(0)
    print (df)
         A    B
    1  Sat  0.0
    2  Sat  2.0
    3  Sat -4.0
    4  Sun  0.0
    5  Sun -4.0
    6  Sun  2.0
    

    If need cast to int:

    df['B'] = df.sort_values('A').groupby('A')['B'].diff().fillna(0).astype(int)
    print (df)
         A  B
    1  Sat  0
    2  Sat  2
    3  Sat -4
    4  Sun  0
    5  Sun -4
    6  Sun  2
    

    Another solution with diff, where add 0 if mask is not equal with shifted column A by ne:

    df['B'] = df.B.diff().mask(df.A.ne(df.A.shift()), 0)
    print (df)
         A    B
    1  Sat  0.0
    2  Sat  2.0
    3  Sat -4.0
    4  Sun  0.0
    5  Sun -4.0
    6  Sun  2.0