Search code examples
rggplot2longitudinal

R - How to show multiple lines with different x-axis values (dates) in ggplot2?


I have done a longitudinal study on children's letter knowledge development. In this study, 120 children were given a letter knowledge task, at three different time points.

The first five rows of df

> head(df)
   id      T1_date T1_letters      T2_date T2_letters      T3_date T3_letters
1 101  2022-10-17           4  2023-05-15          18  2023-12-11          26
2 102  2022-10-18           9  2023-05-15          20  2023-12-11          30
3 103  2022-10-17          14  2023-05-15          30  2023-12-11          30
4 104  2022-10-18           7  2023-05-15          17  2023-12-11          27
5 105  2022-10-17           1  2023-05-16          11  2023-12-12          26
6 106  2022-10-17           2  2023-05-15          11  2023-12-12          26

The first column ("id") in this dataset shows the participant ID numbers. As you can see, all of the time points (i.e., sessions) were on slightly different days. So, one child may have a different date for T3 compared to another child.

Now, I would like to plot the children's letter knowledge scores over time. Every child should be visualized as a single line (so: 120 thin lines, connecting the three scores for each child). The x-axis of this plot should represent the date of each session (with each session having a slightly different date), and the y-axis should represent the letter knowledge scores (these are on a scale from 0 to 34, because we also used digraphs).

At the moment, I don't need different colors for each participant, so it would be fine if all lines are in black. (However, it would be nice to have the option to change the colors for specific subjects later)

Data:

df = structure(list(id = c(101, 102, 103, 104, 105, 106, 201, 202, 
203, 204, 205), T1_date = c("2022-10-17", "2022-10-18", "2022-10-17", 
"2022-10-18", "2022-10-17", "2022-10-17", "2022-12-01", "2022-12-01", 
"2022-12-01", "2022-11-23", "2022-11-23"), T1_letters = c(4, 
9, 14, 7, 1, 2, 3, 8, 0, 3, 8), T2_date = c("2023-05-15", "2023-05-15", 
"2023-05-15", "2023-05-15", "2023-05-16", "2023-05-15", "2023-03-28", 
"2023-03-28", "2023-03-29", "2023-03-27", "2023-03-27"), T2_letters = c(18, 
20, 30, 17, 11, 11, 4, 14, 4, 8, 8), T3_date = c("2023-12-11", 
"2023-12-11", "2023-12-11", "2023-12-11", "2023-12-12", "2023-12-12", 
"2023-09-21", "2023-09-21", "2023-09-21", "2023-09-18", "2023-09-18"
), T3_letters = c(26, 30, 30, 27, 26, 26, 10, 18, 8, 16, 18)), row.names = c(NA, 
-11L), class = "data.frame")

Solution

  • You’ll need to reshape your data into a “long” format with a row for each session. This can be straightforwardly achieved with the pivot_longer() function from the package tidyr.

    library(tidyverse)
    
    long <- df |>
      pivot_longer(
        cols = !id,
        names_to = c("session", ".value"),
        names_sep = "_"
      ) |>
      mutate(date = as.Date(date))
    
    long
    #> # A tibble: 33 × 4
    #>       id session date       letters
    #>    <dbl> <chr>   <date>       <dbl>
    #>  1   101 T1      2022-10-17       4
    #>  2   101 T2      2023-05-15      18
    #>  3   101 T3      2023-12-11      26
    #>  4   102 T1      2022-10-18       9
    #>  5   102 T2      2023-05-15      20
    #>  6   102 T3      2023-12-11      30
    #>  7   103 T1      2022-10-17      14
    #>  8   103 T2      2023-05-15      30
    #>  9   103 T3      2023-12-11      30
    #> 10   104 T1      2022-10-18       7
    #> # ℹ 23 more rows
    

    With the data in long form, you can create the line graph with ggplot2, specifying the participant id as the group to get individual lines.

    ggplot(long, aes(date, letters, group = id)) + geom_line()