Search code examples
rggplot2smoothing

smoothing line with categorical variable with ggplot?


I have a huge data sets and this is a sample.

data.frame(basket_size_group = c("[0,2]", "[0,2]", "(2,4]", "(2,4]", "(4,6]"),
       channel = c("offline", "online/mobile", "offline", "online/mobile", "offline"), 
       pct_trips = c(0.004, 0.038, 0.0028, 0.0082, 0.0037))

By using a ggplot2, I would like to plot smoothing line with the data. Xaxis is the basket_size_group, yaxis is pct_trips, channel is a group in ggplot2 . The problem is that basket_size_group is a categorical variable. How to create smoothing lines by channel with ggplot2?


Solution

  • If you want to use a loess smooth you will need some more data. As it sits stat_smooth() will fail with the error:

    Computation failed in `stat_smooth()`:
    NA/NaN/Inf in foreign function call (arg 5)
    

    Unless you specify method = "lm".

    You also have to be explicit with the stat_smooth() layer and define that group = channel. You could do that in the top layer too, but without it stat_smooth will try to use x and color to do its group summarizing.

    # factor it to plot in order
    dat$basket_size_group <- factor(dat$basket_size_group, levels = c("[0,2]", "(2,4]", "(4,6]"))
    
    ggplot(dat, aes(basket_size_group, pct_trips, color = channel)) +
        geom_point() +
        stat_smooth(aes(group = channel), method = "lm")
    

    enter image description here