Search code examples
pythonpandasdataframepandas-datareader

Dataframe and updating a new column value in a for loop


I am trying to update a value in a dataframe using a method and a forloop. I pass the dataframe into the method and use a for loop to calculate the value I want to put into the last column.

Here is the method

def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00

for i, row in df.iterrows():
    #Get all values from each row
    
    #Find typical price
    tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
    tpv = tp * row['VOLUME']
    sumTpv= sumTpv + tpv
    sumVolume = sumVolume + row['VOLUME']
    vwap = sumTpv / sumVolume
    #Find VWAP
    #df.assign(VWAP = vwap)
    #row.assign(VWAP = vwap)
    #row["VWAP"] = vwap
    df.set_value(row, 'VWAP', vwap)
    df = df.reindex(row = row)
    df[row] = df[row].astype(float)
    dayVwap = dayVwap + vwap
    
    

print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df

And the Dataframe already has the column in it as I add it to it before I pass the df into the method. Like this

df["VWAP"] = ""
#do vwap calculation
df = vwap2(df)

But the values either are all the same which should not be or are not written. I tried a few things but to no success.

Updates

Here is the data that I am using, I am pulling it from Google each time:

                       CLOSE   HIGH      LOW    OPEN  VOLUME        TP  \
 2018-05-10 22:30:00  97.3600  97.48  97.3000  97.460  371766  97.86375   
 1525991460000000000  97.2900  97.38  97.1800  97.350  116164  97.86375   
 1525991520000000000  97.3100  97.38  97.2700  97.270   68937  97.86375   
 1525991580000000000  97.3799  97.40  97.3101  97.330   46729  97.86375   
 1525991640000000000  97.2200  97.39  97.2200  97.365   64823  97.86375   

                          TPV        SumTPV    SumVol       VWAP  
 2018-05-10 22:30:00  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991460000000000  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991520000000000  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991580000000000  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991640000000000  3.722224e+08  1.785290e+09  18291710  97.601027  

As you can see all the calculated stuff is the same.

Here is what I am using right now.

def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00

for i, row in df.iterrows():
    #Get all values from each row
    
    #Find typical price
    tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
    df['TP'] = tp
    
    tpv = tp * row['VOLUME']
    df['TPV'] = tpv
    
    sumTpv= sumTpv + tpv
    df['SumTPV'] = sumTpv
    
    sumVolume = sumVolume + row['VOLUME']
    df['SumVol'] = sumVolume
    
    vwap = sumTpv / sumVolume
    #Find VWAP
    #row.assign(VWAP = vwap)
    #row["VWAP"] = vwap
    #df.set_value(row, 'VWAP', vwap)
    df["VWAP"] = vwap
    dayVwap = dayVwap + vwap
    
    
print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df

Solution

  • IIUC, you don't need a loop, or even apply - you can use direct column assignment and cumsum() to get what you're looking for.

    Some example data:

    import numpy as np
    import pandas as pd
    
    N = 20
    high = np.random.random(N)
    low = np.random.random(N)
    close = np.random.random(N)
    opening = np.random.random(N)
    volume = np.random.random(N)
    data = {"HIGH":high, "LOW":low, "CLOSE":close, "OPEN":opening, "VOLUME":volume}
    df = pd.DataFrame(data)
    
    df.head()
          CLOSE      HIGH       LOW      OPEN    VOLUME
    0  0.848676  0.260967  0.004188  0.139342  0.931406
    1  0.771065  0.356639  0.495715  0.652106  0.988217
    2  0.288206  0.567776  0.023687  0.809410  0.134134
    3  0.832711  0.508586  0.031569  0.120774  0.891948
    4  0.857051  0.391618  0.155635  0.069054  0.628036
    

    Assign the tp and tpv columns directly, then apply cumsum to get sumTpv and sumVolume:

    df["tp"] = (df['HIGH'] + df['LOW'] + df['CLOSE'] + df['OPEN']) / 4
    df["tpv"] = df.tp * df['VOLUME']
    df["sumTpv"] = df.tpv.cumsum()
    df["sumVolume"] = df.VOLUME.cumsum()
    df["vwap"] = df.sumTpv.div(df.sumVolume)
    
    df.head()
          CLOSE      HIGH       LOW      OPEN    VOLUME        tp       tpv  \
    0  0.848676  0.260967  0.004188  0.139342  0.931406  0.313293  0.291803   
    1  0.771065  0.356639  0.495715  0.652106  0.988217  0.568881  0.562178   
    2  0.288206  0.567776  0.023687  0.809410  0.134134  0.422270  0.056641   
    3  0.832711  0.508586  0.031569  0.120774  0.891948  0.373410  0.333063   
    4  0.857051  0.391618  0.155635  0.069054  0.628036  0.368340  0.231331   
    
         sumTpv  sumVolume      vwap  
    0  0.291803   0.931406  0.313293  
    1  0.853982   1.919624  0.444869  
    2  0.910622   2.053758  0.443393  
    3  1.243685   2.945706  0.422203  
    4  1.475016   3.573742  0.412737  
    

    Update (per OP comment):
    To get dayVwap as the sum of all vwap, use dayVwap = df.vwap.sum().