I have the following problem: The data I have looks like this
NO Income_before_taxes Income_aftere_taxes educationLevel
1: 27757 27313 1
2: 40147 38148 2
3: 52240 47880 3
4: 63061 57027 4
5: 92409 78738 5
6: 132985 106661 6
I would like to plot a glm fit of Income_aftere_taxes ~ educationLevel.
I do this with the following code:
ggbox <- ggplot(data = fullDataSet, aes(x = educationLevel, y = Income_aftere_taxes))
ggbox <- ggbox + geom_point()
ggbox <- ggbox + geom_smooth(method = "glm")
The result looks like this:
However, if I want to change the axis labels of the X axis to c("lower than high school", "high school", "college", "associate degree", "bachelor", "master and PhD"), this does not work.
To set the labels with scale_x_descrete, I need to transform the x axis input "educationLevel" to factors. This, however, destroys the glm fit.
So, to sum up, I can either plot a glm fit or change the x axis labels. But I need both options simultaneously on one plot. Is there a way to achieve that?
Try this: Labels shown here are representative. It is assumed that education level stays as a numeric vector not factor or character. Here we create a character vector for our labels.
mylabels<-c("High School","No High School","Some College","No College",
"College","PhD","Postdoc")
Then we use it on our x-axis. The [-7]
is to maintain the length of the question's labels. You can change the labels as you wish.
library(dplyr)
library(ggplot2)
df %>%
ggplot(aes(x = educationLevel, y = Income_aftere_taxes))+geom_point()+
geom_smooth(method="glm")+
scale_x_continuous(breaks=c(1,2,3,4,5,6),labels=mylabels[-7])