Search code examples
pythonnumpyscikit-learnlinear-regressionstatsmodels

Classifying predicted values using a prediction interval


I am running a simple linear regression to predict price for a particular data set. I would like to not only compute metrics such as mean squared error but also be able to compute which testing instances are within a prediction interval of 10% of the actual price (i.e actual price is in the range predicted price +/- 10%). But I'm not sure what the best way to do this is or if an existing package can help. I'm currently working with numpy arrays for my X and Y training data.


# Load in train test split data
X_train, X_test, y_train, y_test = prepare_data()

# fit model 
lm = LinearRegression()
lm.fit(X_train, y_train)

# Compute predictions
y_pred = lm.predict(X_test)



Solution

  • To get all the values of y_test, you can use np.select So in your case the condition is values of y_test that are within 10% of y_pred

    condition = [abs(y_pred - y_test)/y_test < 0.1]
    choice = [y_test]
    result = np.select(condition, choice)