Search code examples
pythonpandasdataframesubtractionin-place

Python: How to subtract value from subselection of cells in a pandas DataFrame IN PLACE?


I try to subtract a value (50) from a subselection of cells in a pandas DataFrame. I want to subtract the value from ‘rt’ where subj == 1 and cond == std. I would like to perform this calculation in place so that the other values stay untouched.

Lets say I have the following DataFrame:

data = {'subj': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,],
        'rt': [100, 102, 101, 100, 101, 101, 105, 105, 106, 104, 104, 106],
        'cond':['nov', 'std', 'std', 'emo', 'std', 'emo', 'nov', 'std', 'std',
                'emo', 'std', 'emo']} 
df = pd.DataFrame(data)

       subj rt cond
0      1  100  nov
1      1  102  std
2      1  101  std
3      1  100  emo
4      1  101  std
5      1  101  emo
6      2  105  nov
7      2  105  std
8      2  106  std
9      2  104  emo
10     2  104  std
11     2  106  emo

Now I want to subtract the value 50 from the 'rt' that meet the criteria subj == 1 and cond == std. I am using the following code to subtract the value.

df['rt'] = df[(df['subj'] == 1) & (df['cond'] == 'std')]['rt'].subtract(50)

This is, what I am expecting:

       subj rt cond
0      1  100  nov
1      1   52  std
2      1   51  std
3      1  100  emo
4      1   51  std
5      1  101  emo
6      2  105  nov
7      2  105  std
8      2  106  std
9      2  104  emo
10     2  104  std
11     2  106  emo

Instead this is what i get:

       subj rt cond
0      1  NaN  nov
1      1  2.0  std
2      1  1.0  std
3      1  NaN  emo
4      1  1.0  std
5      1  NaN  emo
6      2  NaN  nov
7      2  NaN  std
8      2  NaN  std
9      2  NaN  emo
10     2  NaN  std
11     2  NaN  emo

How can I keep the remaining values of the rt column instead of having NaN? I want to subtract the other rt values by other values in the same manner without creating separate DataFrames for each condition.


Solution

  • Use slicing with loc to do it in place efficiently (only the 3 matching values will be computed):

    df.loc[df['subj'].eq(1)&df['cond'].eq('std'), 'rt'] -= 50
    

    output:

        subj   rt cond
    0      1  100  nov
    1      1   52  std
    2      1   51  std
    3      1  100  emo
    4      1   51  std
    5      1  101  emo
    6      2  105  nov
    7      2  105  std
    8      2  106  std
    9      2  104  emo
    10     2  104  std
    11     2  106  emo