Search code examples
rggplot2scattersegment

ggplot add segments to scatter plot according to factors


I have the following 'code'

set.seed(100)
values<-c(rnorm(200,10,1),rnorm(200,2.1,1),rnorm(250,6,1),rnorm(75,2.1,1),rnorm(50,9,1),rnorm(210,2.05,1))
rep1<-rep(3,200)
rep2<-rep(0,200)
rep3<-rep(1,250)
rep4<-rep(0,75)
rep5<-rep(2,50)
rep6<- rep(0,210)
group<-c(rep1,rep2,rep3,rep4,rep5,rep6)
df<-data.frame(values,group)

I would like to plot these data as a scatter plot (like the attached plot) and add segments. These segments (y values) shall represent the mean value of the data for a given group. In addition, the segments should have a different color depending on the factor (group). Is there an efficient way to do it with ggplot ? Many thanks enter image description here


Solution

  • We can do this by augmenting your data a little. We'll use dplyr to get the mean by group, and we'll create variables that give the observation index and one that increments by one each time the group changes (which will be helpful to get the segments you want):1

    library(dplyr)
    
    df <- df %>%
        mutate(idx = seq_along(values), group = as.integer(group)) %>%
        group_by(group) %>%
        mutate(m = mean(values)) %>%
        ungroup() %>%
        mutate(group2 = cumsum(group != lag(group, default = -1)))
    

    Now we can make the plot; using geom_line() with grouping by group2, which changes every time the group changes, makes the segments you want. Then we just color by (a discretized version of) group:

    ggplot(data = df, mapping = aes(x = idx, y = values)) +
        geom_point(shape = 1, color = "blue") +
        geom_line(aes(x = idx, y = m, group = group2, color = as.factor(group)),
                  size = 2) +
        scale_color_manual(values = c("red", "black", "green", "blue"),
                           name = "group") +
        theme_bw()
    

    enter image description here


    1 See https://stackoverflow.com/a/42705593/8386140