I am running a simple linear regression to predict price for a particular data set. I would like to not only compute metrics such as mean squared error but also be able to compute which testing instances are within a prediction interval of 10% of the actual price (i.e actual price is in the range predicted price +/- 10%). But I'm not sure what the best way to do this is or if an existing package can help. I'm currently working with numpy arrays for my X and Y training data.
# Load in train test split data
X_train, X_test, y_train, y_test = prepare_data()
# fit model
lm = LinearRegression()
lm.fit(X_train, y_train)
# Compute predictions
y_pred = lm.predict(X_test)
To get all the values of y_test
, you can use np.select
So in your case the condition is values of y_test
that are within 10% of y_pred
condition = [abs(y_pred - y_test)/y_test < 0.1]
choice = [y_test]
result = np.select(condition, choice)