My calculation is supposed to be a number between 0 and 100 based on financial stock data. Each day this number technically starts at 0 and with each new tick/row, it calculates all past data available for that day. So as you would assume, when used live, this number is constantly changing based on each tick. The ultimate problem I am trying to solve is to put this calculation into a dataframe as a new column. What I can't seem to figure out is how do you calculate one row at a time including past rows in a dataframe.
For example, I would like to start by calculating just row 1 and 2. The result will be in its own column on row 2. The result on row 3 would be a calculation of row 1, 2, and 3. Then the same thing for row 4 and so fourth.
My calculation:
ranging=df.h.max()-df.l.min()
percent=round(100*((df.c-df.l.min())/ranging),2)
So I originally asked ChatGPT to solve this problem first before coming here. Here is what it gave me without using my calculation:
df = pd.DataFrame(columns=['data','anchored_avg'])
# define your data points
data_points = [1,2,3,4,5,6,7,8,9,10]
anchored_avg = data_points[0]
for i, current_data_point in enumerate(data_points):
df = df.append({'data': current_data_point}, ignore_index=True)
if i > 0:
anchored_avg = (anchored_avg*i + current_data_point)/(i+1)
df.loc[i,'anchored_avg'] = anchored_avg
I asked it to do this with my own calculation but the code didn't work. Here is what I tried to do with my own calculation(df is OHLC(Open, High, Low, Close) stock data):
for i, current_data_point in enumerate(df[['o','h','l','c']]):
df= df.append(current_data_point, ignore_index=True)
if i > 0:
ranging=df.h.max()-df.l.min()
percent=round(100*((df.c-df.l.min())/ranging),2)
df.loc[i,'percent'] = percent
The error I get from this is "cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid"
Here is an example of what I did on my own but not the desired result I want: (mask_set represents dataframes for each day I want to use the calculation on)
mask_set = [mask_1,mask_2,mask_3,mask_4,mask_5,mask_6,mask_7,mask_8,mask_9,mask_10]
percent_set = []
for i in mask_set:
ranging=df[i].h.max()-df[i].l.min()
percent=round(100*((df[i].c-df[i].l.min())/ranging),2)
percent_set.append(percent)
percent_comb = pd.concat(percent_set)
df['Symb_perc'] = percent_comb
Output sample:
o h l c volume complete Symb_perc
time
2023-01-13 12:00:00 0.63638 0.63638 0.63635 0.63636 9 True 35.40
2023-01-13 12:00:05 0.63634 0.63636 0.63634 0.63636 11 True 35.40
2023-01-13 12:00:10 0.63639 0.63644 0.63638 0.63644 6 True 36.40
2023-01-13 12:00:15 0.63644 0.63651 0.63644 0.63649 12 True 37.02
2023-01-13 12:00:20 0.63650 0.63654 0.63650 0.63654 5 True 37.64
2023-01-13 12:00:25 0.63654 0.63659 0.63654 0.63659 8 True 38.26
2023-01-13 12:00:30 0.63659 0.63659 0.63652 0.63656 10 True 37.89
2023-01-13 12:00:35 0.63658 0.63658 0.63657 0.63658 5 True 38.14
2023-01-13 12:00:40 0.63658 0.63667 0.63658 0.63667 10 True 39.25
2023-01-13 12:00:45 0.63668 0.63668 0.63666 0.63667 5 True 39.25
I know this calculation is wrong because when I do use this calculation with a live feed, the numbers are way off by a large margin.
I think what you are looking for is the cumulative functions family. Here is the problem you want to solve, if I read correctly:
For each row in the dataframe, calculate the number
percentage
that equals the value at columnC
, minus the minimum value from the beginning, up to that row of columnL
, divided by (the subtraction of the maximum value from the beginning, up to that row of columnH
of the minimum value from the beginning, up to that row of columnL
).
Translate it to the cumulative function thing:
For each row in the dataframe, calculate the number
percentage
that equals the value at columnC
, minus the cumulative min ofL
, divided by (the cumulative max ofH
subtracts the cumulative min ofL
).
Then, we can use the cumsum()
, and cummin(), cummax()
functions. The full code is as below:
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""12:00:00 0.63638 0.63638 0.63635 0.63636
12:00:05 0.63634 0.63636 0.63634 0.63636
12:00:10 0.63639 0.63644 0.63638 0.63644
12:00:15 0.63644 0.63651 0.63644 0.63649
12:00:20 0.63650 0.63654 0.63650 0.63654
12:00:25 0.63654 0.63659 0.63654 0.63659
12:00:30 0.63659 0.63659 0.63652 0.63656
12:00:35 0.63658 0.63658 0.63657 0.63658
12:00:40 0.63658 0.63667 0.63658 0.63667
12:00:45 0.63668 0.63668 0.63666 0.63667
""")
df = pd.read_csv(TESTDATA, sep=" ", names=["O", "H", "L", "C"])
df['percentage'] = round(100*(df['C']-df['L'].cummin())/(df['H'].cummax() - df['L'].cummin()), 2)
df
it prints:
O H L C percentage
0.63638 0.63638 0.63635 0.63636 33.33
0.63634 0.63636 0.63634 0.63636 50.00
0.63639 0.63644 0.63638 0.63644 100.00
0.63644 0.63651 0.63644 0.63649 88.24
0.63650 0.63654 0.63650 0.63654 100.00
0.63654 0.63659 0.63654 0.63659 100.00
0.63659 0.63659 0.63652 0.63656 88.00
0.63658 0.63658 0.63657 0.63658 96.00
0.63658 0.63667 0.63658 0.63667 100.00
0.63668 0.63668 0.63666 0.63667 97.06