Search code examples
rggplot2boxplotsmoothing

How do I weigh stat_smooth with the inverse of the number of values in ggplot2


Given an example boxplot like this in ggplot2:

ggplot(diamonds, aes(carat, price)) +
  geom_boxplot(aes(group = cut_width(carat, 0.25)), outlier.alpha = 0.1) +
  stat_smooth( method="lm", formula = y ~ poly(x,2), n= 40, se=TRUE, color="red", aes(group=1), size=1.5) 

I get an image that looks like this: enter image description here

However the stat_smooth line is greatly influenced by the number of points in each of the carat categories. I would prefer to treat each of the categories equally, which would mean, to my mind, weighting each point with a particular carat value, with the inverse of the number of the total number of points with that value. (So, at 5, the point would have a weight of 1, and at 1, the point would have a weight of 1/aBigNumber.) I've tried the weight aesthetic to the plot, but it breaks the boxplot. I've tried adding the weigh to the smooth, but I get an error:

Error: ggplot2 doesn't know how to deal with data of class uneval

So, how do I weight a smoothing function so that the categories are treated equally (that is inverse to the number of points in the category), and still keep the boxplot in the output?


Solution

  • You could do something like this...

    library(dplyr)
    diamonds2 <- diamonds %>% mutate(cutcarat=cut_width(carat, 0.25)) %>% 
                              group_by(cutcarat) %>% 
                              summarise(carat=mean(carat), price=mean(price))
    ggplot() +
          geom_boxplot(data=diamonds,
                       aes(x=carat, y=price, group = cut_width(carat, 0.25)), 
                       outlier.alpha = 0.1) +
          geom_smooth(data=diamonds2, 
                       aes(x=carat, y=price), method="lm", 
                       formula = y ~ poly(x,2), n= 40, se=TRUE, color="red", size=1.5)
    

    enter image description here