I have a dataframe that looks like the below. You can see that, to start, the SMA column is a duplicate of the Index Value column intentionally. I'm trying to adjust the SMA column so that it will be the result of a simple calculation, something to the effect of the below, which I believe would require a loop of some kind:
for i in range(len(df)):
df['SMA'].loc[i,"SMA"]= df['SMA'].loc[i-1,"SMA"] + df['SMA'].loc[i,"ER"] * (df['SMA'].loc[i,"Index Value"] - df['SMA'].loc[i-1,"SMA"])
In words, I'm attempting to take the current row of 'SMA' and have it be the output of the prior row of "SMA" + the current row of "ER" multiplied by the difference between the current row of "Index Value" and prior row of "SMA"
data = {
'Date': ['9/5/2023', '9/6/2023', '9/7/2023', '9/8/2023', '9/11/2023', '9/12/2023',
'9/13/2023', '9/14/2023', '9/15/2023', '9/18/2023', '9/19/2023', '9/20/2023',
'9/21/2023', '9/22/2023', '9/25/2023', '9/26/2023', '9/27/2023', '9/28/2023',
'9/29/2023'],
'Index Value': [4496.83, 4465.48, 4451.14, 4457.49, 4487.46, 4461.9, 4467.44,
4505.1, 4450.32, 4453.53, 4443.95, 4402.2, 4330, 4320.06,
4337.44, 4273.53, 4274.51, 4299.7, 4288.05],
'ER': [0.15066, 0.157105, 0.218561, 0.233893, 0.233709, 0.191352, 0.090935,
0.398004, 0.127833, 0.062296, 0.198933, 0.382776, 0.494406, 0.541878,
0.44323, 0.489456, 0.466276, 0.396806, 0.52569],
'SMA': [4496.83, 4465.48, 4451.14, 4457.49, 4487.46, 4461.9, 4467.44, 4505.1,
4450.32, 4453.53, 4443.95, 4402.2, 4330, 4320.06, 4337.44, 4273.53,
4274.51, 4299.7, 4288.05]
}
df = pd.DataFrame(data)
In an effort to avoid a loop since I'm not great with loop construction, I've tried something like the below, which is close, but not exactly what I want.
df['SMA']=df['SMA'].shift(1) + df['ER'] * (df['Index Value'] - df['SMA'].shift(1))
The reason why it's not exactly what I want is because I want the first number of the Index Value column to match the first value of SMA, and have the formula apply only after that point (which is why it seems a loop would be necessary).
All of that being said and given the goal, can someone please advise what the best way might be to accomplish this task?
First, you can simplify your existing code in two ways:
'Index Value'
in 'SMA'
as a first step, you can simply work with the values from 'Index Value'
;diff
instead of df[col] - df[col].shift()
.df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
Now to answer your question, IIUC you want to keep the first value of column 'Index Value'
(i.e., 4496.83
) and apply the formula only from the second row onward.
There are a few options to do that:
# 1. use `at` and exclude the first value when creating the new column
df.at[1:, "SMA"] = df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
# 2. create the column and use `at` and `iloc` to replace the first value
df["SMA"] = df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
df.at[0, "SMA"] = df.iloc[0]["Index Value"]
# 3. create the column and use `fillna` to replace the first value
df["SMA"] = df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
df["SMA"] = df["SMA"].fillna(df["Index Value"])
All will result in:
Date Index Value ER SMA
0 9/5/2023 4496.83 0.150660 4496.830000
1 9/6/2023 4465.48 0.157105 4491.904758
2 9/7/2023 4451.14 0.218561 4462.345835
3 9/8/2023 4457.49 0.233893 4452.625221
4 9/11/2023 4487.46 0.233709 4464.494259
5 9/12/2023 4461.90 0.191352 4482.569043
6 9/13/2023 4467.44 0.090935 4462.403780
7 9/14/2023 4505.10 0.398004 4482.428831
8 9/15/2023 4450.32 0.127833 4498.097308
9 9/18/2023 4453.53 0.062296 4450.519970
10 9/19/2023 4443.95 0.198933 4451.624222
11 9/20/2023 4402.20 0.382776 4427.969102
12 9/21/2023 4330.00 0.494406 4366.503887
13 9/22/2023 4320.06 0.541878 4324.613733
14 9/25/2023 4337.44 0.443230 4327.763337
15 9/26/2023 4273.53 0.489456 4306.158867
16 9/27/2023 4274.51 0.466276 4273.986950
17 9/28/2023 4299.70 0.396806 4284.505543
18 9/29/2023 4288.05 0.525690 4293.575711