Search code examples
rggplot2jitter

How to generate the same plot with "jitter", and how to jitter selected points (not all points)?


What I would like to do is:

a) have the plot produced by the ggplot code be the same each time it runs [set.seed kind of notion?] and

b) have text labels jittered only for labels that have the same y-axis value -- leave the other text labels alone. This would seem to be some kind of conditional jittering based on a factor value for the points.

Here is some data:

dput(df)
structure(list(Firm = c("a verylongname", "b verylongname", "c verylongname", 
"d verylongname", "e verylongname", "f verylongname", "g verylongname", 
"h verylongname", "i verylongname", "j verylongname"), Sum = c(74, 
77, 79, 82, 85, 85, 88, 90, 90, 92)), .Names = c("Firm", "Sum"
), row.names = c(NA, 10L), class = "data.frame")

Here is ggplot code using df:

ggplot(df, aes(x = reorder(Firm, Sum, mean), y = Sum)) +
  geom_text(aes(label = Firm), size = 3, show.guides = FALSE, position = position_jitter(height = .9)) +
  theme(axis.text.x = element_blank()) +
  scale_x_discrete(expand = c(-1.1, 0)) +   # to show the lower left name fully
  labs(x = "", y = "", title = "")

Notice one version of the plot still overlaps h and i -- each time I run the above code the locations of the text labels change.

enter image description here

BTW, this question conditional jitter shifts the discrete values on the x-axis a bit, but I would like to shift the overlapping points (only) on the y-axis.


Solution

  • One option is to add a column to mark overlapping points and then plot those separately. A better option might be to directly shift the y-values of the overlapping points, so that we get direct control over their placement. I show both options below.

    Option 1 (jitter): First, add a column to mark overlaps. In this case, because the points pretty much fall on a line, we can mark any points as overlapping if their y-values are too close. You can include more complex conditions if it's important to check whether the x-values are close as well.

    df$overlap = lapply(1:nrow(df), function(i) {
      if(min(abs(df[i, "Sum"] - df$Sum[-i])) <= 1) "Overlap" else "Ignore"
    })
    

    In the plot, I've colored the jittered points red so it's easy to tell which ones were affected.

    # Add set.seed() here to make jitter reproducible
    ggplot(df, aes(x = reorder(Firm, Sum, mean))) +
      geom_text(data=df[df$overlap=="Overlap",], 
                aes(label = Firm, y = Sum), size = 3,  
                position = position_jitter(width=0, height = 1), colour="red") +
      geom_text(data=df[df$overlap=="Ignore",], 
                aes(label = Firm, y = Sum), size = 3) +
      theme(axis.text.x = element_blank()) +
      scale_x_discrete(expand = c(-1.1, 0)) +   # to show the lower left name fully
      labs(x = "", y = "", title = "")
    

    enter image description here

    Option 2 (direct placement): Another option is to directly control how much the labels are shifted, rather than taking whatever jitter happens to give us. In this case, we know that we want to shift each pair of points with the same y-value. More complex logic would be necessary in cases where we need to worry about both x and y values, more than two points in the same overlap, and/or where we need to shift values that are close, but not exactly the same.

    library(dplyr)
    
    # Create a new column that shifts pairs of points with the same y-value by +/- 0.25
    df = df %>% group_by(Sum) %>%
      mutate(SumNoOverlap = if(n()>1) Sum + c(-0.25,0.25) else Sum)
    
    ggplot(df, aes(x = reorder(Firm, Sum, mean), y = SumNoOverlap)) +
      geom_text(aes(label = Firm), size = 3) +
      theme(axis.text.x = element_blank()) +
      scale_x_discrete(expand = c(-1.1, 0)) +   # to show the lower left name fully
      labs(x = "", y = "", title = "")
    

    enter image description here

    Note: To make jitter reproducible, add set.seed(153) (or whatever seed value you want) before the jittered plot code.