python pandas dataframe data-science back-testing

How to properly figure out all possible "long entries" made based on the OHLC data and upper bound and lower bound price series? Pandas related

Say you have a Pandas df that contains the OHLC (short for the Open, High, Low, Close) prices of a particular financial asset.

Also, you have two other Pandas dataframes to consider, one of them called upper_bound that contains a series of prices which are above the close price, and the other called lower_bound that contains a series of prices which are below the close price.

All the necessary data can be found here.

All of these Pandas dataframes share the same index, and this is how it would look like once plotted everything in a single candlestick chart (The pink trend represents the upper_bound, while the white trend represents the lower_bound):

You need to:

Find out the index value at which the low price from the df is lower or equal to the lower bound value.
Find out the next index value at which the high price from the df is greater or equal to the upper bound value.
Estimate the percentage change from the first index value of the lower bound to the second index value of the upper bound.
Append that percentage change, that first index value, and that second index value to another dataframe called possible_long_entries
Repeat this process until there's no more data to analyze

My (possibly bad) approach

I wrote the following Python code in order to solve this problem:

# Find all the possible long entries that could have been made considering the information above
possible_long_entries = pd.DataFrame(columns=['Actual Percentage Change', 'Start Index', 'End Index'])
i=0
while i < (len(df)-1):
    if df['Low Price'][i] <= lower_bound[i]:
        lower_index = i
        j = i + 1
        while j < (len(df)-1):
            if df['High Price'][j] >= upper_bound[j]:
                upper_index = j
                percentage_change = (upper_bound.iat[upper_index] - lower_bound.iat[lower_index]) / lower_bound.iat[lower_index] * 100
                possible_long_entries = possible_long_entries.append({'Actual Percentage Change':percentage_change,'Start Index': lower_index, 'End Index':upper_index},ignore_index=True)
                i = j + 1
                print(i)
                break
            else:
                j += 1
    else:
        i += 1

The problem with this code is the fact that it kinda enters in an infinite loop when i equals to 407, not sure why. After I manually stopped the execution, I checked out the possible_long_entries and these were data that it managed to extract:

final_dict = {'Actual Percentage Change': {0: 3.694220620875114, 1: 2.4230128905797654, 2: 2.1254433367789014, 3: 2.9138599524587625, 4: 3.177040784650736, 5: 1.0867515559002843, 6: 0.08567173253550972, 7: 0.19999498819328332, 8: 3.069342080456284, 9: 1.467935498997383, 10: -0.6867540630203672, 11: 2.019389675661748, 12: 3.1057216745256353, 13: 1.758775161828502}, 'Start Index': {0: 17.0, 1: 50.0, 2: 89.0, 3: 106.0, 4: 113.0, 5: 132.0, 6: 169.0, 7: 193.0, 8: 237.0, 9: 271.0, 10: 285.0, 11: 345.0, 12: 374.0, 13: 401.0}, 'End Index': {0: 38.0, 1: 62.0, 2: 101.0, 3: 109.0, 4: 118.0, 5: 146.0, 6: 185.0, 7: 206.0, 8: 251.0, 9: 281.0, 10: 322.0, 11: 361.0, 12: 396.0, 13: 406.0}}

possible_long_entries = pd.DataFrame(final_dict)

May I get some help here please?

Solution

Finally managed to deal with it. The code below works well assumming you have already copypasted the OHLC Price, upper bound, lower bound data left here. I'm open to learn a better way to get the same output.

# -*- coding: utf-8 -*-
"""
Created on Wed Feb  1 03:11:46 2023

@author: Noah
"""

import pandas as pd
from pandas import Timestamp
from numpy import nan
import mplfinance as mpf

def set_DateTimeIndex(df_trading_pair):
    df_trading_pair = df_trading_pair.set_index('Start Date', inplace=False)
    # Rename the column names for best practices
    df_trading_pair.rename(columns = { "Open Price" : 'Open',
                                       "High Price" : 'High',
                                       "Low Price" : 'Low',
                                       "Close Price" :'Close',
                              }, inplace = True)
    return df_trading_pair

### Here you assign the data to the variables `dict`, `df`, `dict_upper`, `upper_bound`, `dict_lower`, `lower_bound` ###

# trading pair name
trading_pair = "APTUSDT"

# Calculate moving average and std deviation of the close price

window_size = 14
rolling_mean = df['Close Price'].rolling(window=window_size).mean()
rolling_std = df['Close Price'].rolling(window=window_size).std()

# Calculate the upper and lower bounds of the close price based on the moving average and std deviation

upper_bound = rolling_mean + 2 * rolling_std
lower_bound = rolling_mean - 2 * rolling_std

# Create masks to know at which indices the df["Low Price"] was equal or lower than the lower bound
# as well as which indices the df["High Price"] was equal or greater than the upper bound

mask_low = df["Low Price"] <= lower_bound
mask_high = df["High Price"] >= upper_bound

upper_indices = list(mask_high[mask_high].index)
lower_indices = list(mask_low[mask_low].index)

# Now figure out which had been the possible long entries that had been made using the information above
# Keep in mind that this assumes the long entries made reached exactly the corresponding values of the `upper_bound`

data = {"Start Index": [lower_indices[0]], "Endt Index": []}
entry = lower_indices[0]

data = {"Start Index": [], "End Index": []}

entry = lower_indices[0]
for i in range(len(upper_indices)):
    exit = upper_indices[i]
    if exit > entry:
        data["Start Index"].append(entry)
        data["End Index"].append(exit)
        next_entries = [x for x in lower_indices if x > entry and x > exit]
        if len(next_entries) > 0:
            entry = next_entries[0]
        else:
            break
    
possible_long_entries= pd.DataFrame(data)
possible_long_entries['Bullish Percentage Change'] = (upper_bound[possible_long_entries['End Index']].values - lower_bound[possible_long_entries['Start Index']].values)/(lower_bound[possible_long_entries['Start Index']].values)*100

# Mark the possible long entries, in order to do this first create an NaN df that contains the same indices as the original df
# Then assign to each index of the possible_long_entries df its corresponding price from the lower_bound
df_mark_entry_points = pd.DataFrame([float('nan')]*len(df),index=df.index,columns=['Bullish Entries'])
for ix,val in zip(possible_long_entries['Start Index'].values,lower_bound[possible_long_entries['Start Index']].values):
   df_mark_entry_points.loc[ix] = val

# Mark the possible take-profits assuming the highest price would have reached the upper_bound, in order to do this first create an NaN df that contains the same indices as the original df
# Then assign to each index of the possible_long_entries df its corresponding price from the upper_bound
df_mark_tp_points = pd.DataFrame([float('nan')]*len(df),index=df.index,columns=['Bullish Entries'])
for ix,val in zip(possible_long_entries['End Index'].values,upper_bound[possible_long_entries['End Index']].values):
   df_mark_tp_points.loc[ix] = val


# Store the plots of upper and lower bounds as well as the possible long entries for later use
plots_to_add = [mpf.make_addplot(upper_bound,color='#F93BFF'), mpf.make_addplot(lower_bound,color='white'), mpf.make_addplot(df_mark_entry_points,type='scatter',markersize=50,marker='^', color='#00FFE0'), mpf.make_addplot(df_mark_tp_points,type='scatter',markersize=50,marker='v', color='#FFF000')]

# Estimate the percentage change from the lower bound to the uppder bound
bullish_percentage_change_between_bounds = round((upper_bound-lower_bound)/lower_bound*100,2)
bullish_percentage_change_between_bounds.rename('Bulllish Pct Chg', inplace = True)

#If the skewness is between -0.5 and 0.5, the data are fairly symmetrical
#If the skewness is between -1 and — 0.5 or between 0.5 and 1, the data are moderately skewed
#If the skewness is less than -1 or greater than 1, the data are highly skewed

#Kurtosis applied to Financial Markets: https://www.investopedia.com/terms/k/kurtosis.asp

skewness_value = bullish_percentage_change_between_bounds.skew()
kurtosis_value = bullish_percentage_change_between_bounds.kurt()

if (abs(skewness_value) > 0.5):
    
    # Use the median to properly estimate the return on investment per trade
    expected_roi = bullish_percentage_change_between_bounds.median()
    if kurtosis_value > 3:
        print(f'The bullish percentage change between bounds follows a Leptokurtic distribution, and the expected roi is {expected_roi}%')
    elif kurtosis_value < 3:
        print(f'The bullish percentage change between bounds follows a Platikurtic distribution, and the expected roi is {expected_roi}%')

elif (abs(skewness_value) <= 0.5):
    
    # Use the mean to properly estimate the return on investment per trade
    expected_roi = bullish_percentage_change_between_bounds.mean()
    if kurtosis_value > 3:
        print(f'The bullish percentage change between bounds follows a Leptokurtic distribution, and the expected roi is {expected_roi}%')
    elif kurtosis_value < 3:
        print(f'The bullish percentage change between bounds follows a Platikurtic distribution, and the expected roi is {expected_roi}%')
    
print()
# Plot the Close Price, Moving average, upper and lower bounds using a line chart.

# Plotting
# Create my own `marketcolors` style:
mc = mpf.make_marketcolors(up='#2fc71e',down='#ed2f1a',inherit=True)
# Create my own `MatPlotFinance` style:
s  = mpf.make_mpf_style(base_mpl_style=['bmh', 'dark_background'],marketcolors=mc, y_on_right=True)    

# Plot it
candlestick_plot, axlist = mpf.plot(df_trading_pair_date_time_index,
                    figsize=(40,20),
                    figratio=(10, 6),
                    type="candle",
                    style=s,
                    tight_layout=True,
                    datetime_format = '%b %d, %H:%M:%S',
                    ylabel = "Precio ($)",
                    returnfig=True,
                    show_nontrading=True,
                    warn_too_much_data=870, # Silence the Too Much Data Plot Warning by setting a value greater than the amount of rows you want to be plotted
                    addplot = plots_to_add # Add the upper and lower bounds plots as well as the bullish entries to the main plot
                    )
# Add Title
symbol = trading_pair.replace("USDT", "")+"/"+"USDT"
axlist[0].set_title(f"{symbol} - 15m", fontsize=45, style='italic', fontfamily='fantasy')

Output:

The bullish percentage change between bounds follows a Leptokurtic distribution, and the expected roi is 3.57%