I am trying to plot a scatterplot for 2 variables of a large timeseries dataset in R, and I would like to highlight the data from one of the months and bring it upfront. I ahve tried some previous suggested solutions in the forums but they do not seem to work (maybe is because the questions are a bit old and some arguments could be changed with newer versions) So far I have this:
set.seed(123)
date=seq(as.POSIXct("2022-04-01 00:00:00"), as.POSIXct("2022-10-31 23:00:00"), by = "hour")
t= abs(rnorm(length(date)))
y= exp(t)+ rnorm(length(date), mean = 0, sd = 3)
df<-data.frame(date=date,t=t,y=y)
df$month<-month(df$date)
highlight_month <- 1
non_highlighted_colors <- rep("grey", length(unique(df$month)))
non_highlighted_colors[highlight_month] <- "red"
df$order<-ifelse(df$month==highlight_month,1,2)
ggplot(df, aes(t, y)) +
geom_point(aes(color = factor(month),order=order)) +
scale_color_manual(values = non_highlighted_colors) +
labs(color = "Month") +
theme_minimal()
The first thing I get is that order has been ignored. I think maybe it is because I notice that if I highlightmonth 1 in the code that means month 4 in the dataframe, and when I run order it will search for january, which is not in the data.
Is this the reason the code is not working.
Thank you for any suggestion
You can make thinks simpler by mapping the colour to a condition, and by specifying a manual colour scale:
ggplot(df, aes(t, y)) +
geom_point(aes(colour = month == 4)) +
scale_colour_manual(values = c("grey", "red")) +
labs(colour = "Month") +
theme_minimal()
But you probably want to bring the highlighted points to the front, so you'll need to split the plotting into two geom_point
s to make sure that the highlighted points get drawn after (i.e. on top) of the grey ones:
ggplot(df, aes(t, y)) +
geom_point(data = df[df$month != 4, ], aes(colour = month == 4)) +
geom_point(data = df[df$month == 4, ], aes(colour = month == 4)) +
scale_colour_manual(values = c("grey", "red")) +
labs(colour = "Month") +
theme_minimal()
You probably want a nicer legend, so you can do something like construct a factor
variable with the highlighting condition and map that to color:
df$highlight <- factor(df$month == 4,
levels = c(T, F),
labels = c("April", "Other"))
ggplot(df, aes(t, y)) +
geom_point(data = df[df$highlight == "Other", ], aes(colour = highlight)) +
geom_point(data = df[df$highlight == "April", ], aes(colour = highlight)) +
scale_colour_manual(values = c("grey", "red")) +
labs(colour = "Month") +
theme_minimal()
But since the order of the legend is the plotting order, Other
comes first in the legend, and it looks weird. It can be corrected by specifying both the breaks and the values for the colour scale:
ggplot(df, aes(t, y)) +
geom_point(data = df[df$highlight == "Other", ], aes(colour = highlight)) +
geom_point(data = df[df$highlight == "April", ], aes(colour = highlight)) +
scale_colour_manual(breaks = c("April", "Other"),
values = c("red", "grey")) +
labs(colour = "Month") +
theme_minimal()