Search code examples
rlinear-regressiontraining-datapredict

Why am I getting "Error: Problem with `mutate()` column `regression1`"?


I am working on an assignment where I have to evaluate the predictive model based on RMSE (Root Mean Squared Error) using the test data. I have already built a linear regression model to predict wine quality (numeric) using all available predictor variables based on the train data. Below is my current code. The full error is "Error: Problem with mutate() column regression1. i regression1 = predict(regression1, newdata = my_type_test). x no applicable method for 'predict' applied to an object of class "c('double', 'numeric')"

install.packages("rsample")
library(rsample)

my_type_split <- initial_split(my_type, prop = 0.7)
my_type_train <- training(my_type_split)
my_type_test <- testing(my_type_split)  

my_type_train

regression1 <- lm(formula = quality ~ fixed.acidity + volatile.acidity + citric.acid + chlorides + free.sulfur.dioxide + total.sulfur.dioxide +
                  density + pH + sulphates + alcohol, data = my_type_train)

summary(regression1)
regression1

install.packages("caret")
library(caret)
install.packages("yardstick")
library(yardstick)
library(tidyverse)

my_type_test <- my_type_test %>% 
  mutate(regression1 = predict(regression1, newdata = my_type_test)) %>%
  
rmse(my_type_test, price, regression1)

Solution

  • Many of the steps you take are probably unnecessary.
    A minimal example that should achieve the same thing:

    # Set seed for reproducibility
    set.seed(42)
    # Take the internal 'mtcars' dataset
    data <- mtcars
    # Get a random 80/20 split for the number of rows in data
    split <- sample(
       ​size = nrow(data), 
       ​x = c(TRUE, FALSE), 
       ​replace = TRUE,
       ​prob = c(0.2, 0.8)
    )
    # Split the data into train and test sets
    train <- data[split, ]
    test <- data[!split, ]
    
    # Train a linear model
    fit <- lm(mpg ~ disp + hp + wt + qsec + am + gear, data = train)
    
    # Predict mpg in test set
    prediction <- predict(fit, test)
    

    Result:

    > caret::RMSE(prediction, test$mpg)
    [1] 4.116142