Search code examples
pythonexcelspyderfinancetrading

RSI in spyder using data in excel


So I have an excel file containing data on a specific stock.

My excel file contains about 2 months of data, it monitors the Open price, Close price, High Price, Low Price and Volume of trades in 5 minute intervals, so there are about 3000 rows in my file.

I want to calculate the RSI (or EMA if it's easier) of a stock daily, I'm making a summary table that collects the daily data so it converts my table of 3000+ rows into a table with only about 60 rows (each row represents one day).

Essentially I want some sort of code that sorts the excel data by date then calculates the RSI as a single value for that day. RSI is given by: 100-(100/(1+RS)) where RS = average gain of up periods/average loss of down periods.

Note: My excel uses 'Datetime' so each row's 'Datetime' looks something like '2022-03-03 9:30-5:00' and the next row would be '2022-03-03 9:35-5:00', etc. So the code needs to just look at the date and ignore the time I guess.

Some code to maybe help understand what I'm looking for:

So here I'm calling my excel file, I want the code to take the called excel file, group data by date and then calculate the RSI of each day using the formula I wrote above.

dat = pd.read_csv('AMD_5m.csv',index_col='Datetime',parse_dates=['Datetime'],
                  date_parser=lambda x: pd.to_datetime(x, utc=True))
dates = backtest.get_dates(dat.index)
#create a summary table
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index = dates, columns=cols)
# loop backtest by dates

This is the code I used to fill out the other columns in my summary table, I'll put my SMA (simple moving average) function below.

for d in dates:
    this_dat = dat.loc[dat.index.date==d]
    #find the number of observations in date d
    summary_table.loc[d]['Num. Obs.'] = this_dat.shape[0] 
    #get trading (i.e. position holding) signals
    signals = backtest.SMA(this_dat['Close'].values, window=10)
    #find the number of trades in date d
    summary_table.loc[d]['Num. Trade'] = np.sum(np.diff(signals)==1) 
    #find PnLs for 100 shares
    shares = 100
    PnL = -shares*np.sum(this_dat['Close'].values[1:]*np.diff(signals))
    if np.sum(np.diff(signals))>0:
        #close position at market close
        PnL += shares*this_dat['Close'].values[-1]
    summary_table.loc[d]['PnL'] = PnL
    #find the win ratio
    ind_in = np.where(np.diff(signals)==1)[0]+1
    ind_out = np.where(np.diff(signals)==-1)[0]+1
    num_win = np.sum((this_dat['Close'].values[ind_out]-this_dat['Close'].values[ind_in])>0)
    if summary_table.loc[d]['Num. Trade']!=0:
        summary_table.loc[d]['Win. Ratio'] = 1. *num_win/summary_table.loc[d]['Num. Trade']

This is my function for calculating Simple Moving Average. I was told to try and adapt this for RSI or for EMA (Exponential Moving Average). Apparently adapting this for EMA isn't too troublesome but I can't figure it out.

def SMA(p,window=10,signal_type='buy only'):
    #input price "p", look-back window "window",
    #signal type = buy only (default) --gives long signals, sell only --gives sell signals, both --gives both long and short signals
    #return a list of signals = 1 for long position and -1 for short position
    signals = np.zeros(len(p))
    if len(p)<window:
        #no signal if no sufficient data
        return signals
    sma = list(np.zeros(window)+np.nan) #the first few prices does not give technical indicator values 
    sma += [np.average(p[k:k+window]) for k in np.arange(len(p)-window)]
    for i in np.arange(len(p)-1):
        if np.isnan(sma[i]):
            continue #skip the open market time window
        if sma[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
            signals[i] = 1
        elif sma[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
            signals[i] = -1
    return signals

Solution

  • I have two solutions to this. One is to loop through each group, then add the relevant data to the summary_table, the other is to calculate the whole series and set the RSI column as this.

    I first recreated the data:

    import yfinance
    import pandas as pd
    
    # initially created similar data through yfinance,
    # then copied this to Excel and changed the Datetime column to match yours.
    df = yfinance.download("AAPL", period="60d", interval="5m")
    # copied it and read it as a dataframe
    df = pd.read_clipboard(sep=r'\s{2,}', engine="python")
    df.head()
    #                 Datetime        Open        High         Low       Close   Adj Close   Volume
    #0  2022-03-03 09:30-05:00  168.470001  168.910004  167.970001  168.199905  168.199905  5374241
    #1  2022-03-03 09:35-05:00  168.199997  168.289993  167.550003  168.129898  168.129898  1936734
    #2  2022-03-03 09:40-05:00  168.119995  168.250000  167.740005  167.770004  167.770004  1198687
    #3  2022-03-03 09:45-05:00  167.770004  168.339996  167.589996  167.718094  167.718094  2128957
    #4  2022-03-03 09:50-05:00  167.729996  167.970001  167.619995  167.710007  167.710007   968410
    

    Then I formatted the data and created the summary_table:

    df["date"] = pd.to_datetime(df["Datetime"].str[:16], format="%Y-%m-%d %H:%M").dt.date
    
    # calculate percentage change from open and close of each row
    df["gain"] = (df["Close"] / df["Open"]) - 1
    
    # your summary table, slightly changing the index to use the dates above
    cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
    summary_table = pd.DataFrame(index=df["date"].unique(), columns=cols)
    

    Option 1:

    # loop through each group, calculate the average gain and loss, then RSI
    for grp, data in df.groupby("date"):
        # average gain for gain greater than 0
        average_gain = data[data["gain"] > 0]["gain"].mean()
        # average loss for gain less than 0
        average_loss = data[data["gain"] < 0]["gain"].mean()
        
        # add to relevant cell of summary_table
        summary_table["RSI"].loc[grp] = 100 - (100 / (1 + (average_gain / average_loss)))
    

    Option 2:

    # define a function to apply in the groupby
    def rsi_calc(series):
        avg_gain = series[series > 0].mean()
        avg_loss = series[series < 0].mean()
        
        return 100 - (100 / (1 + (avg_gain / avg_loss)))
    
    summary_table["RSI"] = df.groupby("date")["gain"].apply(lambda x: rsi_calc(x))
    

    Output (same for each):

    summary_table.head()
    #           Num. Obs. Num. Trade  PnL Win. Ratio          RSI
    #2022-03-03       NaN        NaN  NaN        NaN  -981.214015
    #2022-03-04       NaN        NaN  NaN        NaN   501.950956
    #2022-03-07       NaN        NaN  NaN        NaN  -228.379066
    #2022-03-08       NaN        NaN  NaN        NaN -2304.451654
    #2022-03-09       NaN        NaN  NaN        NaN  -689.824739