Search code examples
pythonpandassum

Row-wise Sum of single element calculations in Python


I would like to make a row-wise calculation with single elements of a pandas dataframe and elements of a list and finally make the row-wise sum of all these calculations:

enter image description here

The number of list items x_i is identical to the number of columns n_i in the dataframe. I would like to calculate row-wise sums like

My solution results only in 4 values as I loop over the number of columns. But I could also loop over the number of list elements which doesn't change the result.

result = []
for i in range(len(df_n.columns)):   # these are 4
        total = sum(2/3 * x[i]**2 * df_n.iloc[i])
        result.append(total)
print(result)

using mass = (math.pi/6 * binning[i]**3 * roh_solid * counts_nc.iloc[i]).sum() results in the same

Next trial would be looping over the length of the dataframe additionally. A loop in a loop feels like a really bad Python programming. Do I have to use two loop variables to solve this?

Is there an optimized Python way to solve that? Or can you give me a hint to a similar question which solution I did not find with the search function?

a list example:

list = [0.4012, 0.551, 0.8124, 1.1402] 

A dataframe example:

                        n_1          n_2            n_3            n_4  \
time                                                                          
2022-03-18 07:16:54  1.000000e-15  1.000000e-15  1.000000e-15  1.000000e-15   
2022-03-18 07:16:55  7.887821e-01  4.929888e-02  1.000000e-15  1.000000e-15   
2022-03-18 07:16:56  2.030013e+00  1.268758e-01  1.000000e-15  1.000000e-15   
2022-03-18 07:16:57  2.944119e+00  3.236459e-01  1.000000e-15  4.654615e-02   
2022-03-18 07:16:58  3.318537e+00  4.064088e-01  1.000000e-15  6.206153e-02

(the time column is an index column here.)


Solution

  • It looks like you can just vectorize your operation:

    lst = [0.4012, 0.551, 0.8124, 1.1402] 
    
    out = df.mul(2/3*np.array(lst)**2).sum(axis=1)
    

    Output:

    time
    2022-03-18 07:16:54    1.616408e-15
    2022-03-18 07:16:55    9.462046e-02
    2022-03-18 07:16:56    2.435156e-01
    2022-03-18 07:16:57    4.217743e-01
    2022-03-18 07:16:58    4.921507e-01
    dtype: float64
    

    about your code

    I think there are two main mistakes in your code

    • you slice the columns incorrectly (df_n.iloc[i] gives you the ith row, not column). You should use df_n.iloc[:, i]
    • sum(2/3 * x[i]**2 * df_n.iloc[:, i]) would compute the total per column, which is why you end up with 4 values. However your definition of Total sums on i (the columns). You should have 5 values in the output.

    Thus fixing the first part of your code would give:

    result = []
    for i in range(len(df_n.columns)):   # these are 4
            total = sum(2/3 * x[i]**2 * df_n.iloc[:, i])
            result.append(total)
    
    print(result)
    # [0.9745089642303895, 0.18342143066492023, 2.1999792e-15, 0.09413071358292742]
    

    Which is equivalent to:

    df_n.mul(2/3*np.array(x)**2).sum()
    
    n_1    9.745090e-01
    n_2    1.834214e-01
    n_3    2.199979e-15
    n_4    9.413071e-02
    dtype: float64
    

    But I believe that what you really need is:

    result = []
    for i in range(len(df_n.columns)):   # these are 4
            total = (2/3 * x[i]**2 * df_n.iloc[:, i])
            result.append(total)
    
    # sum per row
    result = list(map(sum, zip(*result)))
    
    print(result)
    [1.6164081600000003e-15, 0.09462046128607064, 0.24351562363634796, 0.42177430406900446, 0.4921507194868146]
    

    Which, as shown above, vectorizes to:

    df_n.mul(2/3*np.array(x)**2).sum(axis=1)
    
    time
    2022-03-18 07:16:54    1.616408e-15
    2022-03-18 07:16:55    9.462046e-02
    2022-03-18 07:16:56    2.435156e-01
    2022-03-18 07:16:57    4.217743e-01
    2022-03-18 07:16:58    4.921507e-01
    dtype: float64