Search code examples
rggplot2confidence-intervallinegraph

How to plot paired means for multiple groups in one line graph?


I'm still learning R code so maybe this question is rather simple but I just can't figure it out.

I want to plot the mean scores with confidence interval from a questionnaire that was taken on three different time point: at baseline, after 4 cycles of therapy and after 8 cycles of therapy. This questionnaire contains 3 scales; sensory, motor and autonomic. So I want to plot the mean score from the three different scales per time point. So I want a line graph with on the X-axis the timepoints (at baseline; after 4 cycles; after 8 cycles) and on the Y-axis I want the scores and the graph must contain three different color lines indicating the sensory, motor and autonomic scales. I want to use ggplot.

I have a dataframe with the following columns:

  • ID -> which is the patient's ID (there are 60 patients total in my dataframe)
  • c0sen -> score for sensory scale at baseline
  • c4sen -> score for sensory scale after 4 cycles of therapy
  • c8sen -> score for sensory scale after 8 cycles of therapy
  • c0mot -> score for motor scale at baseline
  • c4mot -> score for motor scale after 4 cycles of therapy
  • c8mot -> score for motor scale after 8 cycles of therapy
  • c0aut -> score for autonomic scale at baseline
  • c4aut -> score for autonomic scale after 4 cycles of therapy
  • c8aut -> score for autonomic scale after 8 cycles of therapy

This is what i'm after:

enter image description here

I hope someone can help me! Many thanks in advance!


Solution

  • It's always a good idea to include your actual data in a question such as this, but the following should be pretty close to what you have:

    set.seed(123)
    
    df  <- data.frame(ID    = factor(1:60),
                      c0sen = rbinom(60, 15, 8.8/15),
                      c4sen = rbinom(60, 15, 9.2/15),
                      c8sen = rbinom(60, 15, 10/15),
                      c0mot = rbinom(60, 15, 8.1/15),
                      c4mot = rbinom(60, 15, 8.4/15),
                      c8mot = rbinom(60, 15, 8.6/15),
                      c0aut = rbinom(60, 15, 3/15),
                      c4aut = rbinom(60, 15, 3/15),
                      c8aut = rbinom(60, 15, 3.5/15))
    head(df)
    #>   ID c0sen c4sen c8sen c0mot c4mot c8mot c0aut c4aut c8aut
    #> 1  1    10     8     9     6     8     7     1     3     2
    #> 2  2     7    12    11     9     8    13     2     3     5
    #> 3  3     9    10    11     7    10     7     5     3     3
    #> 4  4     7    10    11     9     8     7     2     2     4
    #> 5  5     6     8    11     8     9     8     2     5     6
    #> 6  6    12     9     6     8     7     9     4     3     2
    

    Now, this is simply in the wrong format for plotting with ggplot. You first need to get the data into long format and then summarize it. Here we shape the data into appropriate columns using reshape2::melt, then summarizing with summarize from dplyr:

    library(reshape2)
    library(dplyr)
    
    summary_df <- melt(df) %>% 
      mutate(time = as.numeric(substr(variable, 2, 2))) %>%
      transmute(ID, time, modality = as.factor(substr(variable, 3, 5)), 
                score = value) %>%
      group_by(modality, time) %>%
      summarize(mean = mean(score), 
                upper = mean + 1.96 * sd(score)/sqrt(length(score)),
                lower = mean - 1.96 * sd(score)/sqrt(length(score)))
    

    This gives us something to work with:

    summary_df
    #> # A tibble: 9 x 5
    #> # Groups:   modality [3]
    #>   modality  time  mean upper lower
    #>   <fct>    <dbl> <dbl> <dbl> <dbl>
    #> 1 aut          0  2.93  3.35  2.52
    #> 2 aut          4  2.87  3.25  2.48
    #> 3 aut          8  3.45  3.89  3.01
    #> 4 mot          0  7.95  8.38  7.52
    #> 5 mot          4  8.48  8.99  7.98
    #> 6 mot          8  8.62  9.15  8.09
    #> 7 sen          0  8.7   9.18  8.22
    #> 8 sen          4  9.17  9.63  8.71
    #> 9 sen          8 10.1  10.5   9.70
    

    Now we plot using a combination of geom_line, geom_point and geom_errorbar:

    library(ggplot2)
    
    ggplot(summary_df, aes(x = time, y = mean, colour = modality)) + 
      geom_line(size = 1) + 
      geom_point(aes(shape = modality), size = 3) +
      geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, size = 1) +
      theme_classic() +
      scale_color_discrete(labels = c("Autonomic", "Motor", "Sensory")) +
      scale_shape_discrete(labels = c("Autonomic", "Motor", "Sensory")) +
      theme(legend.position = "bottom", text = element_text(size = 12)) +
      labs(x = "Cycles", y = "Symptom score")
    

    Giving us the desired result:

    Created on 2020-07-02 by the reprex package (v0.3.0)