Search code examples
pythonpandassumconditional-statements

Pandas SUM value by Index


How to sum all values from rows by string condition?

Original DF:

schema numb
tbc_hh 1751732
tbc_hh 1084173
tbc_rr 1751732
tbc_rr 1084173
tbc_zz 1751732
tbc_zz 1084173

DF expect to get:

schema numb
tbc_hh 1751732
tbc_hh 1084173
tbc_hh_total 2835905
tbc_rr 1751733
tbc_rr 1084174
tbc_rr_sum 2835907
tbc_zz 1751734
tbc_zz 1084175
tbc_zz_sum 2835909

Solution

  • You can try

    out = (df.groupby('schema', as_index=False)
           .apply(lambda g: pd.concat([g,
                                       pd.DataFrame([[g.name+'_total', g['numb'].sum()]],
                                                    columns=g.columns)]))
           .reset_index(drop=True))
    
    print(out)
    
             schema     numb
    0        tbc_hh  1751732
    1        tbc_hh  1084173
    2  tbc_hh_total  2835905
    3        tbc_rr  1751732
    4        tbc_rr  1084173
    5  tbc_rr_total  2835905
    6        tbc_zz  1751732
    7        tbc_zz  1084173
    8  tbc_zz_total  2835905