Search code examples
pythonpandasaggregate

Sum specific number of columns for each row with Pandas


I have the dollowing dataframe:

   name  code     1   2  3  4  5  6  7 .........155    days
0  Lari  EH214    0   5  2  1  0  0  0           0      3
1  Suzi  FK362    0   0  0  0  2  3  0           0      108
2  Jil   LM121    0   0  4  2  1  0  0           0      5
...

I want to sum the column between column 1 to column with the number that appears on "days" , for example,
for row 1, I will sum 3 days-> 0+5+2
For row 2 108 days,
for row 3 5 days->0+4+2+1+0

How can I do something like this? Looking for method.


Solution

  • For vectorized solution filter rows by positions first and get mask by compare days in numpy boroadasting, if not match replace 0 in DataFrame.where and last sum:

    df1 = df.iloc[:, 2:-1]
    
    m = df1.columns.astype(int).to_numpy() <= df['days'].to_numpy()[:, None]
    
    df['sum'] = df1.where(m, 0).sum(axis=1)
    print (df)
       name   code  1  2  3  4  5  6  7  155  days  sum
    0  Lari  EH214  0  5  2  1  0  0  0    0     3    7
    1  Suzi  FK362  0  0  0  0  2  3  0    0   108    5
    2   Jil  LM121  0  0  4  2  1  0  0    0     5    7