Search code examples
rggplot2glm

Change axis lables in ggplot2 while glm fit is ploted


I have the following problem: The data I have looks like this

NO              Income_before_taxes Income_aftere_taxes educationLevel
1:               27757               27313              1
2:               40147               38148              2
3:               52240               47880              3
4:               63061               57027              4
5:               92409               78738              5
6:              132985              106661              6

I would like to plot a glm fit of Income_aftere_taxes ~ educationLevel.

I do this with the following code:

ggbox <- ggplot(data = fullDataSet, aes(x = educationLevel, y = Income_aftere_taxes))
ggbox <- ggbox + geom_point()
ggbox <- ggbox + geom_smooth(method = "glm")

The result looks like this: enter image description here However, if I want to change the axis labels of the X axis to c("lower than high school", "high school", "college", "associate degree", "bachelor", "master and PhD"), this does not work. To set the labels with scale_x_descrete, I need to transform the x axis input "educationLevel" to factors. This, however, destroys the glm fit.

So, to sum up, I can either plot a glm fit or change the x axis labels. But I need both options simultaneously on one plot. Is there a way to achieve that?


Solution

  • Try this: Labels shown here are representative. It is assumed that education level stays as a numeric vector not factor or character. Here we create a character vector for our labels.

    mylabels<-c("High School","No High School","Some College","No College",
                                     "College","PhD","Postdoc")
    

    Then we use it on our x-axis. The [-7] is to maintain the length of the question's labels. You can change the labels as you wish.

    library(dplyr)
    library(ggplot2)    
    df %>% 
          ggplot(aes(x = educationLevel, y = Income_aftere_taxes))+geom_point()+
          geom_smooth(method="glm")+
          scale_x_continuous(breaks=c(1,2,3,4,5,6),labels=mylabels[-7])