I would like to make a row-wise calculation with single elements of a pandas dataframe and elements of a list and finally make the row-wise sum of all these calculations:
The number of list items x_i is identical to the number of columns n_i in the dataframe. I would like to calculate row-wise sums like
My solution results only in 4 values as I loop over the number of columns. But I could also loop over the number of list elements which doesn't change the result.
result = []
for i in range(len(df_n.columns)): # these are 4
total = sum(2/3 * x[i]**2 * df_n.iloc[i])
result.append(total)
print(result)
using mass = (math.pi/6 * binning[i]**3 * roh_solid * counts_nc.iloc[i]).sum()
results in the same
Next trial would be looping over the length of the dataframe additionally. A loop in a loop feels like a really bad Python programming. Do I have to use two loop variables to solve this?
Is there an optimized Python way to solve that? Or can you give me a hint to a similar question which solution I did not find with the search function?
a list example:
list = [0.4012, 0.551, 0.8124, 1.1402]
A dataframe example:
n_1 n_2 n_3 n_4 \
time
2022-03-18 07:16:54 1.000000e-15 1.000000e-15 1.000000e-15 1.000000e-15
2022-03-18 07:16:55 7.887821e-01 4.929888e-02 1.000000e-15 1.000000e-15
2022-03-18 07:16:56 2.030013e+00 1.268758e-01 1.000000e-15 1.000000e-15
2022-03-18 07:16:57 2.944119e+00 3.236459e-01 1.000000e-15 4.654615e-02
2022-03-18 07:16:58 3.318537e+00 4.064088e-01 1.000000e-15 6.206153e-02
(the time column is an index column here.)
It looks like you can just vectorize your operation:
lst = [0.4012, 0.551, 0.8124, 1.1402]
out = df.mul(2/3*np.array(lst)**2).sum(axis=1)
Output:
time
2022-03-18 07:16:54 1.616408e-15
2022-03-18 07:16:55 9.462046e-02
2022-03-18 07:16:56 2.435156e-01
2022-03-18 07:16:57 4.217743e-01
2022-03-18 07:16:58 4.921507e-01
dtype: float64
I think there are two main mistakes in your code
df_n.iloc[i]
gives you the ith row, not column). You should use df_n.iloc[:, i]
sum(2/3 * x[i]**2 * df_n.iloc[:, i])
would compute the total per column, which is why you end up with 4 values. However your definition of Total
sums on i
(the columns). You should have 5 values in the output.Thus fixing the first part of your code would give:
result = []
for i in range(len(df_n.columns)): # these are 4
total = sum(2/3 * x[i]**2 * df_n.iloc[:, i])
result.append(total)
print(result)
# [0.9745089642303895, 0.18342143066492023, 2.1999792e-15, 0.09413071358292742]
Which is equivalent to:
df_n.mul(2/3*np.array(x)**2).sum()
n_1 9.745090e-01
n_2 1.834214e-01
n_3 2.199979e-15
n_4 9.413071e-02
dtype: float64
But I believe that what you really need is:
result = []
for i in range(len(df_n.columns)): # these are 4
total = (2/3 * x[i]**2 * df_n.iloc[:, i])
result.append(total)
# sum per row
result = list(map(sum, zip(*result)))
print(result)
[1.6164081600000003e-15, 0.09462046128607064, 0.24351562363634796, 0.42177430406900446, 0.4921507194868146]
Which, as shown above, vectorizes to:
df_n.mul(2/3*np.array(x)**2).sum(axis=1)
time
2022-03-18 07:16:54 1.616408e-15
2022-03-18 07:16:55 9.462046e-02
2022-03-18 07:16:56 2.435156e-01
2022-03-18 07:16:57 4.217743e-01
2022-03-18 07:16:58 4.921507e-01
dtype: float64