I have the following 'code'
set.seed(100)
values<-c(rnorm(200,10,1),rnorm(200,2.1,1),rnorm(250,6,1),rnorm(75,2.1,1),rnorm(50,9,1),rnorm(210,2.05,1))
rep1<-rep(3,200)
rep2<-rep(0,200)
rep3<-rep(1,250)
rep4<-rep(0,75)
rep5<-rep(2,50)
rep6<- rep(0,210)
group<-c(rep1,rep2,rep3,rep4,rep5,rep6)
df<-data.frame(values,group)
I would like to plot these data as a scatter plot (like the attached plot) and add segments. These segments (y values) shall represent the mean value of the data for a given group. In addition, the segments should have a different color depending on the factor (group). Is there an efficient way to do it with ggplot ?
Many thanks
We can do this by augmenting your data a little. We'll use dplyr
to get the mean by group
, and we'll create variables that give the observation index and one that increments by one each time the group
changes (which will be helpful to get the segments you want):1
library(dplyr)
df <- df %>%
mutate(idx = seq_along(values), group = as.integer(group)) %>%
group_by(group) %>%
mutate(m = mean(values)) %>%
ungroup() %>%
mutate(group2 = cumsum(group != lag(group, default = -1)))
Now we can make the plot; using geom_line()
with grouping by group2
, which changes every time the group changes, makes the segments you want. Then we just color by (a discretized version of) group
:
ggplot(data = df, mapping = aes(x = idx, y = values)) +
geom_point(shape = 1, color = "blue") +
geom_line(aes(x = idx, y = m, group = group2, color = as.factor(group)),
size = 2) +
scale_color_manual(values = c("red", "black", "green", "blue"),
name = "group") +
theme_bw()