I am trying to update a value in a dataframe using a method and a forloop. I pass the dataframe into the method and use a for loop to calculate the value I want to put into the last column.
Here is the method
def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00
for i, row in df.iterrows():
#Get all values from each row
#Find typical price
tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
tpv = tp * row['VOLUME']
sumTpv= sumTpv + tpv
sumVolume = sumVolume + row['VOLUME']
vwap = sumTpv / sumVolume
#Find VWAP
#df.assign(VWAP = vwap)
#row.assign(VWAP = vwap)
#row["VWAP"] = vwap
df.set_value(row, 'VWAP', vwap)
df = df.reindex(row = row)
df[row] = df[row].astype(float)
dayVwap = dayVwap + vwap
print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df
And the Dataframe already has the column in it as I add it to it before I pass the df into the method. Like this
df["VWAP"] = ""
#do vwap calculation
df = vwap2(df)
But the values either are all the same which should not be or are not written. I tried a few things but to no success.
Here is the data that I am using, I am pulling it from Google each time:
CLOSE HIGH LOW OPEN VOLUME TP \
2018-05-10 22:30:00 97.3600 97.48 97.3000 97.460 371766 97.86375
1525991460000000000 97.2900 97.38 97.1800 97.350 116164 97.86375
1525991520000000000 97.3100 97.38 97.2700 97.270 68937 97.86375
1525991580000000000 97.3799 97.40 97.3101 97.330 46729 97.86375
1525991640000000000 97.2200 97.39 97.2200 97.365 64823 97.86375
TPV SumTPV SumVol VWAP
2018-05-10 22:30:00 3.722224e+08 1.785290e+09 18291710 97.601027
1525991460000000000 3.722224e+08 1.785290e+09 18291710 97.601027
1525991520000000000 3.722224e+08 1.785290e+09 18291710 97.601027
1525991580000000000 3.722224e+08 1.785290e+09 18291710 97.601027
1525991640000000000 3.722224e+08 1.785290e+09 18291710 97.601027
As you can see all the calculated stuff is the same.
Here is what I am using right now.
def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00
for i, row in df.iterrows():
#Get all values from each row
#Find typical price
tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
df['TP'] = tp
tpv = tp * row['VOLUME']
df['TPV'] = tpv
sumTpv= sumTpv + tpv
df['SumTPV'] = sumTpv
sumVolume = sumVolume + row['VOLUME']
df['SumVol'] = sumVolume
vwap = sumTpv / sumVolume
#Find VWAP
#row.assign(VWAP = vwap)
#row["VWAP"] = vwap
#df.set_value(row, 'VWAP', vwap)
df["VWAP"] = vwap
dayVwap = dayVwap + vwap
print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df
IIUC, you don't need a loop, or even apply
- you can use direct column assignment and cumsum()
to get what you're looking for.
Some example data:
import numpy as np
import pandas as pd
N = 20
high = np.random.random(N)
low = np.random.random(N)
close = np.random.random(N)
opening = np.random.random(N)
volume = np.random.random(N)
data = {"HIGH":high, "LOW":low, "CLOSE":close, "OPEN":opening, "VOLUME":volume}
df = pd.DataFrame(data)
df.head()
CLOSE HIGH LOW OPEN VOLUME
0 0.848676 0.260967 0.004188 0.139342 0.931406
1 0.771065 0.356639 0.495715 0.652106 0.988217
2 0.288206 0.567776 0.023687 0.809410 0.134134
3 0.832711 0.508586 0.031569 0.120774 0.891948
4 0.857051 0.391618 0.155635 0.069054 0.628036
Assign the tp
and tpv
columns directly, then apply cumsum
to get sumTpv
and sumVolume
:
df["tp"] = (df['HIGH'] + df['LOW'] + df['CLOSE'] + df['OPEN']) / 4
df["tpv"] = df.tp * df['VOLUME']
df["sumTpv"] = df.tpv.cumsum()
df["sumVolume"] = df.VOLUME.cumsum()
df["vwap"] = df.sumTpv.div(df.sumVolume)
df.head()
CLOSE HIGH LOW OPEN VOLUME tp tpv \
0 0.848676 0.260967 0.004188 0.139342 0.931406 0.313293 0.291803
1 0.771065 0.356639 0.495715 0.652106 0.988217 0.568881 0.562178
2 0.288206 0.567776 0.023687 0.809410 0.134134 0.422270 0.056641
3 0.832711 0.508586 0.031569 0.120774 0.891948 0.373410 0.333063
4 0.857051 0.391618 0.155635 0.069054 0.628036 0.368340 0.231331
sumTpv sumVolume vwap
0 0.291803 0.931406 0.313293
1 0.853982 1.919624 0.444869
2 0.910622 2.053758 0.443393
3 1.243685 2.945706 0.422203
4 1.475016 3.573742 0.412737
Update (per OP comment):
To get dayVwap
as the sum of all vwap
, use dayVwap = df.vwap.sum()
.