Given an example boxplot like this in ggplot2:
ggplot(diamonds, aes(carat, price)) +
geom_boxplot(aes(group = cut_width(carat, 0.25)), outlier.alpha = 0.1) +
stat_smooth( method="lm", formula = y ~ poly(x,2), n= 40, se=TRUE, color="red", aes(group=1), size=1.5)
I get an image that looks like this:
However the stat_smooth line is greatly influenced by the number of points in each of the carat categories. I would prefer to treat each of the categories equally, which would mean, to my mind, weighting each point with a particular carat value, with the inverse of the number of the total number of points with that value. (So, at 5, the point would have a weight of 1, and at 1, the point would have a weight of 1/aBigNumber.) I've tried the weight aesthetic to the plot, but it breaks the boxplot. I've tried adding the weigh to the smooth, but I get an error:
Error: ggplot2 doesn't know how to deal with data of class uneval
So, how do I weight a smoothing function so that the categories are treated equally (that is inverse to the number of points in the category), and still keep the boxplot in the output?
You could do something like this...
library(dplyr)
diamonds2 <- diamonds %>% mutate(cutcarat=cut_width(carat, 0.25)) %>%
group_by(cutcarat) %>%
summarise(carat=mean(carat), price=mean(price))
ggplot() +
geom_boxplot(data=diamonds,
aes(x=carat, y=price, group = cut_width(carat, 0.25)),
outlier.alpha = 0.1) +
geom_smooth(data=diamonds2,
aes(x=carat, y=price), method="lm",
formula = y ~ poly(x,2), n= 40, se=TRUE, color="red", size=1.5)