Search code examples
rggplot2plotaesgeom

ggplot2: scatterplot with two variables (measured on the same scale) on the y-axis: how do I change the aesthetics & add seperate regression lines?


For my thesis, I am making scatterplots in APA format in R. So far, my code is as follows, and it works great for plotting just one variable with confidence interval and regression line:

  scatterplot=ggplot(dat, aes(x=STAIT, y=valence))+
    geom_point()+
    geom_smooth(method=lm,se=T, fullrange=T,colour='black')+
    labs(x='STAI-T score', y='Report length')+
    apatheme

However, I have two variables that were initially measured on the same 0-100 scale: valence and arousal. Instead of two seperate plots, I thought it would be nice to add both variables in a single plot, using 'valence/arousal score' as the ylab and open/closed dots to define which data points come from which variable, a bit like in this example I found online. In that example, however, the data comes from different groups. So that code doesn't work on my data. I've tried different things, and the closest I get, is with the following code:

sp.both=ggplot(dat, aes(x=STAIT))+
  geom_point(aes(y=valence)) +
  geom_point(aes(y=arousal)) +
  apatheme

This gives me a scatterplot with data points of both of the variables added in the same plot. However, I need the data points of one score to be visually different from the other, and I want to add two seperate regression lines for each variable. But everything I've tried so far, has resulted in errors, and I cannot find any examples online of people trying to do the same thing.

Any help would be highly appreciated!


Solution

  • Using some random example data you could achieve your desired like so:

    It's best to reshape your data to long format using e.g. tidyr::pivot_longer which gives us two new cols, one with the names of the variables and one with the corresponding values. After reshaping you could map the values on y and set different shapes and linetypes by mapping the variables column on shape and linetype:

    library(ggplot2)
    library(tidyr)
    
    set.seed(42)
    dat <- data.frame(
      STAIT = runif(20, 0, 1),
      valence = runif(20, 0, 1),
      arousal = runif(20, 0, 1)
    )
    
    dat_long <- dat %>%
      pivot_longer(c(valence, arousal), names_to = "var", values_to = "value")
    
    ggplot(dat_long, aes(x = STAIT, y = value, linetype = var, shape = var)) +
      geom_point() +
      geom_smooth(method = "lm", se = FALSE, color = "black", size = .5)
    #> `geom_smooth()` using formula 'y ~ x'