Search code examples
rregressionlinear-regressionlmpredict

Should I provide x or log(x) to predict() if my model is y ~ log(x)?


I have a fitted lm model

log_log_model = lm(log(price) ~ log(carat), data = diamonds)`

I want to predict price using this model, but I'm not sure if I should be entering log(carat) or carat value as predictor into the predict() function?

Choice 1

exp(predict(log_log_model, data.frame(carat = log(3)),
            interval = 'predict', level = 0.99))

Choice 2

exp(predict(log_log_model, data.frame(carat = 3),
    interval = 'predict', level = 0.99))

Which one is correct?


Solution

  • Choice 2 is correct.

    To give you some extra bit of confidence, let's inspect what the design matrix looks like when we make prediction.

    ## for diamonds dataset
    library(ggplo2)
    
    ## log-log linear model
    fit <- lm(log(price) ~ log(carat), data = diamonds)
    
    ## for prediction
    newdat <- data.frame(data.frame(carat = 3))
    
    ## evaluate the design matrix for prediction
    Xp <- model.matrix(delete.response(terms(fit)), data = newdat)
    #  (Intercept) log(carat)
    #1           1   1.098612
    

    See it? carat = 3 is automatically evaluated to log(carat) = log(3).