Search code examples
pythonpandasscikit-learnstatsmodelst-test

One sided t-test for linear regression?


I have problems with this. I am trying to do a linear regression and test the slope. The t-test checks if the slope is far away from 0. The slope can be negative or positive. I am only interested in negative slopes.

In this example, the slope is positive which I am not interested in, so the P value should be large. But it is small because right now it tests if the slope is far away from 0, in either direction. (I am forcing an intercept of zero, which is what I want). Can someone help me with the syntax to see if the slope is only negative. In this case the P value should be large.

And how can I change to, to say 99% confidence level or 95% or...?

import statsmodels.api as sm
import matplotlib.pyplot as plt
import numpy
X = [-0.013459134, 0.01551033, 0.007354476, 0.014686473, -0.014274754, 0.007728445, -0.003034186, -0.007409397]
Y = [-0.010202462, 0.003297546, -0.001406498, 0.004377665, -0.009244517, 0.002136552, 0.006877126, -0.001494624]
regression_results = sm.OLS (Y, X, missing = "drop").fit ()
P_value = regression_results.pvalues [0]
R_squared = regression_results.rsquared
K_slope = regression_results.params [0]
conf_int = regression_results.conf_int ()
low_conf_int = conf_int [0][0]
high_conf_int = conf_int [0][1]
fig, ax = plt.subplots ()
ax.grid (True)
ax.scatter (X, Y, alpha = 1, color='orchid')
x_pred = numpy.linspace (min (X), max (X), 40)
y_pred = regression_results.predict (x_pred)
ax.plot (x_pred, y_pred, '-', color='darkorchid', linewidth=2)

Solution

  • p-value for the two-way t-test is calculated by:

    import scipy.stats as ss
    df = regression_results.df_resid
    ss.t.sf(regression_results.tvalues[0], df) * 2 # About the same as (1 - cdf) * 2.
    # see @user333700's comment
    Out[12]: 0.02903685649821508
    

    Your modification would just be:

    ss.t.cdf(regression_results.tvalues[0], df)
    Out[14]: 0.98548157175089246
    

    since you are interested in the left-tail only.

    For confidence interval, you just need to pass the alpha parameter:

    regression_results.conf_int(alpha=0.01)
    

    for a 99% confidence interval.