Search code examples
pythonapache-sparkpysparkspark-koalas

How to calculate an average stock price depending on periods


I am trying to calculate the average opening price for a stock, depending on different periods (week, month, year).

Here you can see a part of my df : My dataframe (987 rows for the complete df)

Firstly, I am trying to calculate the average opening price week by week. I found a solution, but it is unsustainable (it took my computer 5min to finish the calculations). Here it is :

def average_opening_and_closing_prices(df):
    
    array = [0]
    n = df["weekofyear"].count()
    j=0

    for i in range(0,n): 

        array[j] = array[j] + kdf["Open"][i]
        if i != n-1 and kdf["weekofyear"][i] != kdf["weekofyear"][i+1]:
            array.append(0)
            j = j+1
    
    for x in array:
      print(str(x) + " ") 
    
average_opening_and_closing_prices(AMAZON_df)

Could you help me to improve my solution (mainly on execution time) ? Also, for example, I would like to add a column, directly to my df, which contains the results for each week, instead of putting the results in an array.

I am not allowed to use pandas, I can only use pyspark and koalas.


Solution

  • [UPDATED: To include year into the calculation] As you are looking for average price for week (and year) and already added the weekofyear in data frame, panda's itself can do it for you. Just add a column for year and try df.groupby(['year', 'weekofyear']).mean() Sample below:

    import pandas as pd
    
    df = pd.DataFrame({
        'weekofyear' : [1, 1, 1, 2, 2, 2, 3, 3, 3], 
        'year' : [2017, 2017, 2018, 2017, 2017, 2018, 2017, 2017, 2018],
        'Open' : [757, 758, 759, 761, 761, 762, 763, 764, 764]
    })
    
    result_df = df.groupby(['year', 'weekofyear']).mean()
    
    print(result_df)
    

    Output

    Open
    year weekofyear       
    2017 1           757.5
         2           761.0
         3           763.5
    2018 1           759.0
         2           762.0
         3           764.0