I am working on complex calculations of a dataset that requires recalculating based on the sum of the previous rows. I'll try to articulate as best as I can.
I have an n x n dataset, I need to create a new n x n dataset that takes the value from the first record, subtracts the cumulative sum of that row up to that value, and then multiplies by a value from another 1 x n series.
For example: n x n Dataset :
i | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 200 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 200 | 150 | 0 | 0 | 0 | 0 | 0 |
4 | 200 | 150 | 125 | 0 | 0 | 0 | 0 |
5 | 200 | 150 | 125 | 100 | 0 | 0 | 0 |
6 | 200 | 150 | 125 | 100 | 75 | 0 | 0 |
1 x n Series :
i | 0 |
---|---|
0 | .000 |
1 | .001 |
2 | .002 |
3 | .003 |
4 | .004 |
5 | .005 |
6 | .006 |
Result:
i | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | .4 [(200 - sum 0) * .002] | 0 | 0 | 0 | 0 | 0 |
3 | .5988 [(200 - sum .4) * .003] | .45 [(150 - sum 0) * .003] | 0 | 0 | 0 | 0 |
4 | .7960 [(200 - sum (.4, .5988)) * .004] | .5982 [(150 - sum .45) * .004] | .50 | 0 | 0 | 0 |
5 | .9910 [(200 - sum (.4, .5988, .7960)) * .005] | .7447 [(150 - sum .45, .5982) * .005] | .6225 | .50 | 0 | 0 |
6 | 1.1832 [(200 - sum (.4, .5988, .7960, .9910)) * .006] | .8892 [(150 - sum .45, .5982, .7447) * .006] | .74326 | .5970 | .450 | 0 |
I am good at using pandas or numpy to get this data. I am currently looping over each row and column and it is taking a really long time to calculate.
Any help is greatly appreciated.
This is code that works using looping.
import numpy as np
import pandas as pd
df_a =pd.DataFrame([[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[200,0,0,0,0,0,0],[200,150,0,0,0,0,0],[200,150,125,0,0,0,0],[200,150,125,100,0,0,0],[200,150,125,100,75,0,0]]).astype(float)
df_series = pd.Series([0.0,.001, .002, .003, .004, .005, .006])
df_b = pd.DataFrame(np.arange(49).reshape(7,7)).astype(float)
for i in range(df_a.shape[0]): #iterate over rows
for j in range(df_a.shape[1]): #iterate over columns
df_a_val = df_a.iloc[i, j] #source dataset value
ser_val = df_series.iloc[i] #value of series at row
df_b_prev = df_b.iloc[:i,j] #get series of previous values
df_b_sum = df_b_prev.sum() #sum previous values
df_b.at[i, j] = (df_a_val-df_b_sum)*(ser_val) #cell based calculation and set value
print("(df_a_val: {} - df_b_sum: {}) * ser_val : {} = {}".format(df_a_val, df_b_sum, ser_val, df_b.at[i, j]))
print(df_series)
print(df_b)
You could use shift
after broadcasted multiplication
a = (df_a > 0) * df_series.to_numpy()[:,None]
df_a * a * (1 - a).cumprod(0).shift(fill_value=0)
0 1 2 3 4 5 6
0 0.000000 0.000000 0.000000 0.000 0.00 0.0 0.0
1 0.000000 0.000000 0.000000 0.000 0.00 0.0 0.0
2 0.400000 0.000000 0.000000 0.000 0.00 0.0 0.0
3 0.598800 0.450000 0.000000 0.000 0.00 0.0 0.0
4 0.796005 0.598200 0.500000 0.000 0.00 0.0 0.0
5 0.991026 0.744759 0.622500 0.500 0.00 0.0 0.0
6 1.183285 0.889242 0.743265 0.597 0.45 0.0 0.0