Search code examples
pythonmachine-learningscikit-learnlinear-regression

Why "prediction space" is needed?


It's an old problem about prediction using regression exploring Gapminder data. They used "prediction space" to compute prediction.

Q1. Why should I be creating "prediction space"? What is the use of it?

Q2. The relation of computing predictions over the "prediction space"?

import numpy as np
import pandas as pd

# Read the CSV file into a DataFrame: df
df = pd.read_csv('gapminder.csv')

The data seems like this;

Country,Year,life,population,income,region

Afghanistan,1800,28.211,3280000,603.0,South Asia

Slovak Republic,1960,70.47800000000001,4137224,8693.0,Europe & Central Asia

# Create arrays for features and target variable
y = df.life.values
X = df.fertility.values

# Reshape X and y
y = y.reshape(-1,1)
X = X.reshape(-1,1)

# Create the regressor: reg
reg = LinearRegression()

# Create the prediction space
prediction_space = np.linspace(min(X_fertility), max(X_fertility)).reshape(-1,1)

# Fit the model to the data
reg.fit(X_fertility, y)

# Compute predictions over the prediction space: y_pred
y_pred = reg.predict(prediction_space)

Solution

  • I believe that you are taking a course from DataCamp

    I stumbled upon this too, and the answer is prediction_space and y_pred are used to construct the straight line in the graph

    NOTE: for those who are reading this and don't understand what I'm talking about, the code snippet is actually missing the graph plotting code

    # Plot regression line
    plt.plot(prediction_space, y_pred, color='black', linewidth=3)
    plt.show()