Search code examples
rplotglmpredict

R: Why aren't the predicted values of my glm forming a curved line?


I've got a script designed to create a glm and use it to normalise a dataset between 0 and 1, after which I make a graph to display the relationship. I've been doing this for multiple datasets and the line is always curved (like the first graph), but for this one particular dataset, the curve is just 3 straight lines (second graph). I'm guessing it's something to do with the newdata in predict, but I'm not sure.

Curved line Straight lines

My code:

# turn off scientific notation
options(scipen = 999)

# recreating the data
IV_BP <- structure(list(Breakpoints = c("Min", "BP1", "BP2", "BP3", "BP4", "Max"),
                        SES = c(-1.8, -0.3, -0.1, 0.1, 0.3, 0.8),
                        Normalised_value = c(0,0.2, 0.4, 0.6, 0.8, 1)),
                   class = "data.frame", row.names = c(NA, -6L))

IV_df <- structure(list(SES = c(-0.006, 0.078, 0.028, -0.066, 0.041, -0.025, 
                                0.006, -0.021, -0.013, -0.145, -0.065, 0.026, 0.068, -0.22, 0.138, 
                                0.019, 0.174, 0.107, 0.339, 0.219, 0.093, -0.057, -0.19, 0.01, 
                                0.085, -0.011, -0.075, -0.113, -0.019, 0.141, -0.045, -0.258, 
                                -0.02, -0.178, -0.142, -0.067, 0.1, -0.155, 0.007, -0.18, -0.258, 
                                -0.497)), class = "data.frame", row.names = c(NA, -42L))

# make glm
glmfit <- glm(Normalised_value~SES,data=IV_BP,family = quasibinomial())

# use glm to transform values
IV_df$CC_Transformed <- predict(glmfit,newdata=IV_df,type="response")

# make a graph
plot(IV_BP$SES, IV_BP$Normalised_value,
     xlab = "Socioeconomic Status Index Score",
     ylab = "Normalised Values",
     xlim = c(-2, 2),
     pch = 19,
     col = "blue",
     panel.first =
       c(abline(h = 0, col = "lightgrey"),
         abline(h = 0.2, col = "lightgrey"),
         abline(h = 0.4, col = "lightgrey"),
         abline(h = 0.6, col = "lightgrey"),
         abline(h = 0.8, col = "lightgrey"),
         abline(h = 1, col = "lightgrey"),
         lines(-2:2,predict(glmfit,newdata=data.frame(SES=-2:2),type="response"),
      col = "lightblue",
      lwd = 5)))

Solution

  • Your x values -2:2 resolution is not enough to give you the curve. Increase the resolution with seq by steps of 0.1.

    And plot the line first, then overplot the points.

    # make glm
    glmfit <- glm(Normalised_value ~ SES, data = IV_BP, family = quasibinomial())
    
    pred_df <- data.frame(SES = seq(-2, 2, by = 0.1))
    pred_df$CC_Transformed <- predict(glmfit, newdata = pred_df, type = "response")
    
    # make a graph
    plot(CC_Transformed ~ SES, data = pred_df,
         type = "l",
         xlab = "Socioeconomic Status Index Score",
         ylab = "Normalised Values",
         xlim = c(-2, 2),
         lwd = 5,
         col = "lightblue", 
         panel.first = c(abline(h = 0, col = "lightgrey"),
                           abline(h = 0.2, col = "lightgrey"),
                           abline(h = 0.4, col = "lightgrey"),
                           abline(h = 0.6, col = "lightgrey"),
                           abline(h = 0.8, col = "lightgrey"),
                           abline(h = 1, col = "lightgrey")))
    
    points(Normalised_value ~ SES, data = IV_BP, pch = 19, col = "blue")
    

    enter image description here