Search code examples
pythonnumpymeandatareader

How do I continiously calculate something based on the past X amount of data? (Please see info for more details)


Goal:

Calculate 50day moving average for each day, based on the past 50 days. I can calculate the mean for the entire dataset, but I am trying to contiously calculate the mean based on the past 50 days...with it changing each day of course!

import numpy as np
import pandas_datareader.data as pdr
import pandas as pd

# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.

ticker = ['AAPL']


#Define the data period that you would like

start_date = '2017-07-01'
end_date = '2019-02-08'

# User pandas_reader.data.DataReader to load the stock prices from Yahoo Finance. 

df = pdr.DataReader(ticker, 'yahoo', start_date, end_date)

# Yahoo Finance gives 'High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'.

#Export Close PRice, Volume, and Date from yahoo finance 

CloseP = df['Close']
CloseP.head()

Volm = df['Volume']
Volm.head()

Date = df["Date"] = df.index

#create a table with Date, Close Price, and Volume

Table = pd.DataFrame(np.array(Date), columns = ['Date'])
Table['Close Price'] = np.array(CloseP)
Table['Volume'] = np.array(Volm)




print (Table)

#create a column that contiosuly calculates 50 day MA
#This is what I can't get to work! 

MA = np.mean(df['Close'])
Table['Moving Average'] = np.array(MA)


print (Table)

Solution

  • First of all, please, don't use CamelCase to name your variables, as they look as class names otherwise.

    Next, use merge() to join your data frames instead of those yours np.array way:

    >>> table = CloseP.merge(Volm, left_index=True, right_index=True)
    >>> table.columns = ['close', 'volume']  # give names to columns
    >>> table.head(10)
                     close      volume
    Date
    2017-07-03  143.500000  14277800.0
    2017-07-05  144.089996  21569600.0
    2017-07-06  142.729996  24128800.0
    2017-07-07  144.179993  19201700.0
    2017-07-10  145.059998  21090600.0
    2017-07-11  145.529999  19781800.0
    2017-07-12  145.740005  24884500.0
    2017-07-13  147.770004  25199400.0
    2017-07-14  149.039993  20132100.0
    2017-07-17  149.559998  23793500.0
    

    Finally, use combination of rolling(), mean() and dropna() to calculate moving average:

    >>> ma50 = table.rolling(window=50).mean().dropna()
    >>> ma50.head(10)
                     close      volume
    Date
    2017-09-12  155.075401  26092540.0
    2017-09-13  155.398401  26705132.0
    2017-09-14  155.682201  26748954.0
    2017-09-15  156.025201  27248670.0
    2017-09-18  156.315001  27430024.0
    2017-09-19  156.588401  27424424.0
    2017-09-20  156.799201  28087816.0
    2017-09-21  156.952201  28340360.0
    2017-09-22  157.034601  28769280.0
    2017-09-25  157.064801  29254384.0
    

    Please, refer to the docs of mentioned API calls to get more info about their usage. Good luck!