Search code examples
pythonpandasstringcumsum

Pandas sum() with character condition


I have the following dataframe:

>>>variable    value
0  var1        0.25
1  var2        0.11
2  tt_var1     0.01
3  tt_var2     0.0004
4  tt_var3     0.07
5  bb_var5     0.102
6  var_3       0.45
7  bb_var74    0.23

I want to use cumsum() in order to sum the values in column "1", but only for specific variables: I want to sum all the variables that start with tt and all the variable that start with bb in my dataframe, so in the end i'll have the folowing table :

>>>variable    value
0  var1        0.25
1  var2        0.11
2  tt          *sum of all tt **
3  bb          *sum of all bb**
4  var_3       0.45

I know how to use cumsum but I haven't found any way to specify it to be only on specific rows that have something in common in one row (e.g the letters bb or tt).

Is there any way to use cumsum() in this way?


Solution

  • Use groupby,

    grouper = df['variable'].replace({'tt_.*': 'tt', 'bb_.*': 'bb'}, regex=True)
    df.groupby(grouper).sum().reset_index()
    

    Output

      variable   value
    0       bb  0.3320
    1       tt  0.0804
    2     var1  0.2500
    3     var2  0.1100
    4    var_3  0.4500