Search code examples
rggplot2boxplot

Highlight points on grouped box plot


This is a different question but follows on from this: R boxplot Subset column based on value in another column

UPDATED

my dataset looks like this:

Term Name True Result Gender
T1 Name1 True 4 F
T2 Name2 False 6 F
T3 Name3 True 5.5 M
T3 Name4 False 4.6 M

The test dataset:

dataset_test <- structure(list(Term = c("T1", "T1", "T1", "T1", "T1", "T1", "T2", 
"T2", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", 
"T3", "T3"), Name = c("Name1", "Name2", "Name3", "Name4", "Name5", 
"Name6", "Name5", "Name6", "Name7", "Name8", "Name9", "Name10", 
"Name11", "Name12", "Name13", "Name14", "Name15", "Name16", "Name17", 
"Name18"), TRUE. = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
FALSE, TRUE, TRUE), Result = c(4, 5, 6, 4, 5, 6, 5.5, 4.6, 5.5, 
4.6, 5, 5.2, 6, 5.5, 4, 5.5, 4.8, 5, 5, 4.4), Gender = c("F", 
"F", "F", "M", "M", "M", "F", "F", "F", "F", "M", "M", "M", "F", 
"F", "F", "F", "M", "M", "M")), class = "data.frame", row.names = c(NA, 
-20L))

I have a grouped box plot by gender below. I want to be able to highlight the points in the right gender boxplot, i.e. the points need to align with the gender of the True record.

Solution credited to chemdork123

dataset_test %>% 
  group_by(Term) %>% 
  filter(any(TRUE.)) %>%
  ggplot(aes(x = Term, y = Result, fill = Gender)) + 
  scale_fill_brewer(palette = "Blues") +
  geom_boxplot(position=position_dodge(0.8))+
  geom_point(                               # add the highlight points
    data=subset(dataset_test, TRUE. == TRUE), 
    aes(x=Term, y=Result), position=position_dodge(0.8),
    color="blue", size=4, show.legend = FALSE) +
  ggtitle("Distribution of results by term") +
  xlab("Term ") + ylab("Result)")

Position dodge now works perfectly if there are true records for both genders. But breaks if there are only one. However, this is the main use case for this visualisation.

The code above produces this:

enter image description here

Again any help would be greatly appreciated.


Solution

  • You were probably close : you need to use position_dodge on the geom_point() call. In order to be sure that the points align correctly with the position of the boxplots, you also should explicitly define the width of position_dodge for the boxplot geom too. I also include show.legend=FALSE for geom_point() here, since you likely don't want the blue dots on the legend like you had in your example:

    dataset %>% 
      group_by(Term) %>% 
      filter(any(TRUE.)) %>%
      ggplot(aes(x = Term, y = Result, fill = Gender)) + 
      scale_fill_brewer(palette = "Blues") +
      geom_boxplot(position=position_dodge(0.8))+
      geom_point(                               # add the highlight points
        data=subset(dataset, TRUE. == TRUE), 
        aes(x=Term, y=Result), position=position_dodge(0.8),
        color="blue", size=4, show.legend = FALSE) +
      ggtitle("Distribution of results by term") +
      xlab("Term ") + ylab("Result)")
    

    enter image description here