Search code examples
rggplot2scatter-plot

How to make a scatterplot with grouping variables?


enter image description here

Say I have a dataset that looks like what is shown in the picture, and I want to make a scatterplot whose x-axis presents the results during the A test phase and y-axis presents the results during the B test phase. The results is a continous variable. There should be four data points pinpointed in the graph, representing the four subjects. How am I supposed to do it with ggplot in R?

ggplot(data, aes(x = results, y = results)) + geom_point()

This is what I tried, but it does not give me what I wanted.


Solution

  • You'll need the test_phase data in wide format in this case as the values provide the x, y coordinates of the plot. This can be achieved using tidyr::pivot_wider. After that it all down to formatting with ggplot.

    # Some dummy data
    
    set.seed(12)
    df1 <- data.frame(subject = rep(1:4, 2),
                      test_phase = rep(c("A", "B"), each = 4),
                      results = sample(1000:6000, 8))
    
    # packages for data wrangling and plotting
    library(tidyr)
    library(ggplot2)
    
    df1 |> 
      pivot_wider(names_from = test_phase, values_from = results) |> 
      ggplot(aes(A, B, colour = factor(subject))) +
      geom_point() +
      labs(colour = "Subject")
    

    Created on 2023-06-26 with reprex v2.0.2