I'm getting a persistent error "Error: variables ‘x1’, ‘x2’, ‘x3’ were specified with different types from the fit" while trying to predict the outcome of new data using predict in R. I've run this code with success on other models, but for some reason I can't figure out what is wrong with this one. I've replicated the issue with the following code:
# make data
set.seed(19870630)
n <- 1000
df <- data.frame(y = rgamma(n, shape = .5, rate = 1),
#runif(n, 0, 1), # trows same error
x1 = runif(n, 0, 100),
x2 = runif(n, 0, 100),
x3 = runif(n, -1, 1))
df$x2 <- df$x1*df$x1
# refine data by scaling
df$x1 <- scale(df$x1, center = TRUE)
df$x2 <- scale(df$x2, center = TRUE)
df$x3 <- scale(df$x3, center = TRUE)
# double check
head(df); plot(df)
# fit model
mod <- glm(y ~ x1 + x2 + x3, data = df, family=Gamma(link="log"))
# confirm, success
summary(mod)
# make data to retain predictions
## first get realistic ranges of variables of interest, other vars will be held at mean
(x1_span <- c(rep(seq(min(df$x1), max(df$x1)), length = 50)))
(x2_span <- c(rep(seq(min(df$x2), max(df$x2)), length = 50)))
df_pred_x1_x2 <- data.frame(x1 = x1_span,
x2 = x2_span,
x3 = mean(df$x3))
# generate function for prediction ml predicted values
predict_fun <- function(my_glm) {
predict(my_glm, newdata = df_pred_x1_x2) # this is predict.glm
}
df_pred_x1_x2$y_value_pred <- predict_fun(mod) # error
# "Error: variables ‘x1’, ‘x2’, ‘x3’ were specified with different types from the fit"
# End March 8, 2021
Any help would be appreciated, thank you.
This happens because scale()
makes the variables into single-column matrices (note the num [1:1000, 1]
) in the description of x1
below. To be honest, I'm never sure when this is or isn't going to cause trouble ...
str(df)
'data.frame': 1000 obs. of 4 variables:
$ y : num ...
$ x1: num [1:1000, 1] 1.448 -1.702 -0.559 -1.147 0.732 ...
..- attr(*, "scaled:center")= num 49.2
..- attr(*, "scaled:scale")= num 28.5
...
You can work around this by calling df <- lapply(df,drop)
to drop the extra dimension (before you fit the model). @dlaggy points out that you can also define your own scaling function (function(x) (x-mean(x))/sd(x)
); you could also define
myscale <- function(...) drop(scale(...))
Note that unlike using c()
(which I had suggested in my previous answer), which drops dimensions and other attributes, drop()
only drops dimensions - so you can keep your scale/center attributes with the data as you go along.