Search code examples
pythonmatplotlibmathplotlinear-regression

How to plot upper and lower boundary with a LINEAR line on a scatter plot?


I have a data frame df with columns A and Q. I am using this code to draw a line of equation on it.

#Actual line of equation, which has to be plotted: Q=alpha*A^beta : ln(Q)=a+b*ln(A) : y = a+b(x)

x = np.log(df['A'])
y = np.log(df['Q'])

#deriving b,a
b,a = np.polyfit(np.log(x), y, 1)

#deriving alpha and beta. By using a = ln(alpha); b = beta -1
alpha = np.exp(a)
beta = b + 1

Q = df['Q'].values
A = df['A'].values

#equation of line
q = alpha * np.power(A,beta)

#plotting the points and line
plt.scatter(A,Q)
plt.plot(A,q, '-r')
plt.yscale('log')
plt.xscale('log')

This gives the following output, which is similar to a regression line.

enter image description here

But I am interested in plotting the same line of the equation as the upper and lower curve/boundary joining the farthest points(perpendicular to the green line) on both sides as shown below with the same slope as that of the continuous green line.

enter image description here


Solution

  • The idea is to first search the index of the point where the difference between the line and the plot is minimal (cf. maximal). With this point, alpha_min can be calculated such that

    Q[pos_min] == alpha_min * np.power(A[pos_min], beta), thus

    alpha_min = Q[pos_min] / np.power(A[pos_min], beta).

    As such lines can extend quite far away from the original points, it can help to restore the x and y limits (thus clipping the plot to the original region).

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame()
    df['A'] = 10 ** np.random.uniform(0, 1, 1000) ** 2
    df['Q'] = 10 ** np.random.uniform(0, 1, 1000) ** 2
    
    x = np.log(df['A'])
    y = np.log(df['Q'])
    
    # deriving b,a
    b, a = np.polyfit(np.log(x), y, 1)
    
    # deriving alpha and beta. By using a = ln(alpha); b = beta - 1
    alpha = np.exp(a)
    beta = b + 1
    
    Q = df['Q'].values
    A = df['A'].values
    
    # plotting the points and line
    plt.yscale('log')
    plt.xscale('log')
    plt.scatter(A, Q, color='b')
    
    # equation of line
    xmin, xmax = plt.xlim() # the limits of the x-axis for drawing the line
    x = np.linspace(xmin, xmax, 50)
    q = alpha * np.power(x, beta)
    plt.plot(x, q, '-r')
    ymin, ymax = plt.ylim()  # store the limits of the scatter and line plot so they can be restored later
    
    pos_min = np.argmin(Q / np.power(A, beta))
    pos_max = np.argmax(Q / np.power(A, beta))
    
    alpha_min = Q[pos_min] / np.power(A[pos_min], beta)
    alpha_max = Q[pos_max] / np.power(A[pos_max], beta)
    
    # plt.scatter(A[pos_min], Q[pos_min], s=100, fc='none', ec='r', lw=3)
    # plt.scatter(A[pos_max], Q[pos_max], s=100, fc='none', ec='g', lw=3)
    
    plt.plot(x, (alpha_max) * np.power(x, beta), '--r')
    plt.plot(x, (alpha_min) * np.power(x, beta), '--r')
    
    plt.xlim(xmin, xmax)  # restore the limits of the scatter plot
    plt.ylim(ymin, ymax)
    plt.show()
    

    example plot