Search code examples
pythonmachine-learningmean-square-error

In Python ML both my RMSE & MAE are consistently calculated as 0


Here is my code:

X = store1.drop(['Store','Date', 'Holiday_Flag','Days','Temperature'], axis=1)
y = store1['Weekly_Sales']

# scaling the predictor data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_sc = sc.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sc, y, test_size=0.2, random_state=21)

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred = lin_reg.predict(X_test)

print("MAE: {}" .format(mean_absolute_error(y_test, y_pred)))
print("RMSE: {}" .format(mean_squared_error(y_test, y_pred)))

pic of notebook: https://i.sstatic.net/Z48nX8dm.png

It seems correct but this much of accuracy is really a concern i seek for a knowledgeable guidance.


Solution

  • You have forgotten to drop the 'weekly_Sales' from the X dataset, which causes overfitting problems. This occurs because the target value is included in your features, allowing your model to make perfect predictions, hence a MAE and RMSE of 0.

    Secondly : It would also be better to split the dataset first, then apply standard scaling. Scaling the entire dataset before splitting into train and test sets can lead to data leakage. For more information on data leakage, see this