Search code examples
rggplot2scatter-plot

How to make scatterplot with geom_jitter plot reproducible?


I am using the Australian AIDS Survival Data. This time to create scatterplots.

To show the genders in survival of different Reported transmission category (T.categ), I plot the chart in this way:

data <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Aids2.csv")

data %>%
  ggplot() +
  geom_jitter(aes(T.categ, sex, colour = status))

It shows a chart. But each time I run the code, it seems to produce a different chart. Here are 2 of them putting together.

enter image description here

Anything wrong with the codes? Is it normal (each run a different chart)?


Solution

  • Try setting the seed when plotting:

    set.seed(1); ggplot(data, aes(T.categ, sex, colour = status)) +
      geom_jitter()
    

    From the manual ?geom_jitter:

    It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.

    To have that "random variation" reproducible, we need to set set.seed when plotting.