I have the following data in long form:
data <- '"","n","variable","value"
"1",1,"adjr2",0.0365013693015789
"2",2,"adjr2",0.0514307495746085
"3",3,"adjr2",0.0547096973547058
"4",4,"adjr2",0.0552737311430782
"5",5,"adjr2",0.0552933455488706
"6",6,"adjr2",0.0552904097804204
"7",1,"cp",631.119186022639
"8",2,"cp",132.230096988504
"9",3,"cp",23.4429422708563
"10",4,"cp",5.55840294833615
"11",5,"cp",5.9017131979017
"12",6,"cp",7
"13",1,"bic",-1156.56144387716
"14",2,"bic",-1641.2046046544
"15",3,"bic",-1741.38235791823
"16",4,"bic",-1750.90145310605
"17",5,"bic",-1742.19643112204
"18",6,"bic",-1732.73634326858'
df <- read.csv(text=data)
I want to create a point plot for every variable. Currently, I'm doing this with ggplot2
:
ggplot(df) + geom_point(aes(x = n, y = value, fill = variable)) +
facet_grid(variable ~ ., scale="free_y")
I would now like to highlight with a different colour one point for each subplot. I cannot figure out how to add it to the current geom_point
, is it even possible?
For example, how would I highlight the maximum in the first subplot and the minimum in the other two? Like this, for the first one:
I found a way to do it manually with three separate plots which are then joined in a grid, but that solution is 25 lines and there's a lot of repeated code. Is there a way to do it by just slightly modifying the above snippet?
(By the way, the minimum and maximum are found as which.min(df$value[df$variable == 'cp'])
, etc.)
You could add a column to mark the maximum or minimum value in each facet. The code below adds a column to mark the maximum value in facets where a linear regression fit has a positive slope and the minimum value when the slope is negative. This added column is then mapped to a colour aesthetic to set the point colors. (You can also make the highlighted points larger and/or use a different point marker for them by mapping the new column to, respectively, size
and shape
aesthetics.)
library(dplyr)
df = df %>%
group_by(variable) %>% # Group by the faceting variable
mutate(highlight = coef(lm(value ~ n))[2], # Get slope for each facet
highlight = ifelse(highlight > 0, # Mark max or min value, depending on slope
ifelse(value==max(value),"Y","N"),
ifelse(value==min(value),"Y","N")))
ggplot(df) +
geom_point(aes(x = n, y = value, colour=highlight), size=2, show.legend=FALSE) +
facet_grid(variable ~ ., scale="free_y") +
scale_colour_manual(values=c("black","red")) +
theme_bw()
You can do this without permanently adding the new column to your data frame by piping the data frame directly to ggplot instead of saving the updated data frame first:
df %>%
group_by(variable) %>%
mutate(highlight = coef(lm(value ~ n))[2],
highlight = ifelse(highlight > 0,
ifelse(value==max(value),"Y","N"),
ifelse(value==min(value),"Y","N"))) %>%
ggplot() +
geom_point(aes(x=n, y=value, colour=highlight), size=2, show.legend=FALSE) +
facet_grid(variable ~ ., scale="free_y") +
scale_colour_manual(values=c("black","red")) +
theme_bw()