Search code examples
pythonregressionlinestock

Python Linear Regression Line with Stock Data - Get the Closing Prices on the y-Axis


I used a previous thread here on stackoverflow in order to get to the point where I find myself. I want to make a stock chart that shows the line of best fit. I mostly have it working except for one problem. The y-Axis shows a normalized scale of -0.10 to 0.25 rather than the price of the stock. I want the price of the stock to be displayed on the y-Axis.

#!/usr/bin/env python3

import numpy as np
import pandas_datareader.data as web
import pandas as pd
import datetime
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import statistics as stat

#get adjusted close price of Tencent from yahoo
start = datetime.datetime(2020, 5, 21)
end = datetime.datetime(2021, 5, 21)
tencent = pd.DataFrame()
tencent = web.DataReader('IBM', 'yahoo', start, end)['Adj Close']

nomalized_return=np.log(tencent/tencent.iloc[0])

df = pd.DataFrame(data=nomalized_return)

df = df.resample('D').asfreq()

# Create a 'x' and 'y' column for convenience
df['y'] = df['Adj Close']     # create a new y-col (optional)
df['x'] = np.arange(len(df))  # create x-col of continuous integers


# Drop the rows that contain missing days
df = df.dropna()

X=df['x'].values[:, np.newaxis]
y=df['y'].values[:, np.newaxis]

# Fit linear regression model using scikit-learn
lin_reg = LinearRegression()
lin_reg.fit(X, y)

# Make predictions w.r.t. 'x' and store it in a column called 'y_pred'
df['y_pred'] = lin_reg.predict(df['x'].values[:, np.newaxis])

df['above']= y + np.std(y)
df['below']= y - np.std(y)
# Plot 'y' and 'y_pred' vs 'DateTimeIndex`
df[['y', 'y_pred']].plot()


plt.show()

The problem is with these lines

nomalized_return=np.log(tencent/tencent.iloc[0])

df = pd.DataFrame(data=nomalized_return)

If I replace df = pd.DataFrame(data=nomalized_return) with df = pd.DataFrame(data=tencent) then it works. I get the prices on the y-Axis but then the regression line ends up being wrong. Anyway, the image below shows what I'm getting with the code above and it shows the problem.

chart if IBM with a regression line


Solution

  • You can scale the response back back taking the exponential and multiplying by the first value:

    df['y_pred'] = lin_reg.predict(df['x'].values[:, np.newaxis])
    df['y_unscaled'] = tencent
    df['y_pred_unscaled'] = np.exp(df['y_pred']) * tencent.iloc[0]
    
    df[['y_unscaled', 'y_pred_unscaled']].plot()
    

    enter image description here