Search code examples
rdplyrsummarykableextra

r summary for multilevel factor with many checkpoints


I have this long dataset with one factor variable and many levels, for simplicity I am just using two here(Curve, Line).

The values(score) can be Above(Above threshold) or Below (Below threshold) and it is measured at three different time points. Not all IDs are measured at all 3 time points. Some are measured during only t1, some t1, t2, some t1,t3, some all time points, t1,t2,t3.

Id  Factor  Level   Time    Score
1   Curve   Above   t1  16.11
1   Curve   Above   t2  15.67
1   Curve   Above   t3  11.24
2   Curve   Above   t1  17.93
2   Curve   Above   t2  11.82
2   Curve   Above   t3  12.95
3   Curve   Above   t1  12.68
4   Curve   Above   t1  11.53
4   Curve   Above   t2  11.74
4   Curve   Above   t3  14.40
5   Curve   Above   t1  14.48
5   Curve   Above   t3  17.32
6   Curve   Above   t1  11.61
6   Curve   Above   t2  14.96
7   Curve   Above   t1  14.00
7   Curve   Above   t2  10.02
7   Curve   Above   t3  14.52
8   Curve   Above   t1  11.85
8   Curve   Below   t3  3.26
9   Curve   Below   t1  2.49
9   Curve   Below   t3  7.00
10  Curve   Below   t2  3.68
10  Curve   Below   t3  1.62
11  Curve   Below   t1  8.08
11  Curve   Below   t2  1.59
11  Curve   Below   t3  1.59
1   Line    Above   t1  10.20
1   Line    Above   t2  13.20
1   Line    Above   t3  15.85
2   Line    Above   t1  19.80
2   Line    Above   t2  11.99
3   Line    Above   t3  17.32
3   Line    Above   t1  10.43
4   Line    Above   t1  12.34
4   Line    Above   t2  14.25
5   Line    Above   t3  14.72
5   Line    Above   t1  15.02
6   Line    Above   t3  17.94
6   Line    Above   t1  19.65
7   Line    Above   t1  18.75
7   Line    Below   t3  3.25
9   Line    Below   t1  2.43
10  Line    Below   t1  7.51
11  Line    Below   t3  2.15
11  Line    Below   t1  7.47
12  Line    Below   t1  1.56
12  Line    Below   t3  6.03
13  Line    Below   t1  4.98

What I am trying to do is create a summary like this.

enter image description here

I can do this manually or in excel but the fact that I have more than 20 levels in my actual dataset makes this process tedious and inefficient. I am looking for help inorder to make this process more efficient and faster in r. Appreciate your time and assistance. Thanks in advance.


Solution

  • I think you want this?

    library(tidyverse)
    #> Warning: package 'tidyverse' was built under R version 4.1.3
    
    df <- read.table(text = 'Id  Factor  Level   Time    Score
    1   Curve   Above   t1  16.11
    1   Curve   Above   t2  15.67
    1   Curve   Above   t3  11.24
    2   Curve   Above   t1  17.93
    2   Curve   Above   t2  11.82
    2   Curve   Above   t3  12.95
    3   Curve   Above   t1  12.68
    4   Curve   Above   t1  11.53
    4   Curve   Above   t2  11.74
    4   Curve   Above   t3  14.40
    5   Curve   Above   t1  14.48
    5   Curve   Above   t3  17.32
    6   Curve   Above   t1  11.61
    6   Curve   Above   t2  14.96
    7   Curve   Above   t1  14.00
    7   Curve   Above   t2  10.02
    7   Curve   Above   t3  14.52
    8   Curve   Above   t1  11.85
    8   Curve   Below   t3  3.26
    9   Curve   Below   t1  2.49
    9   Curve   Below   t3  7.00
    10  Curve   Below   t2  3.68
    10  Curve   Below   t3  1.62
    11  Curve   Below   t1  8.08
    11  Curve   Below   t2  1.59
    11  Curve   Below   t3  1.59
    1   Line    Above   t1  10.20
    1   Line    Above   t2  13.20
    1   Line    Above   t3  15.85
    2   Line    Above   t1  19.80
    2   Line    Above   t2  11.99
    3   Line    Above   t3  17.32
    3   Line    Above   t1  10.43
    4   Line    Above   t1  12.34
    4   Line    Above   t2  14.25
    5   Line    Above   t3  14.72
    5   Line    Above   t1  15.02
    6   Line    Above   t3  17.94
    6   Line    Above   t1  19.65
    7   Line    Above   t1  18.75
    7   Line    Below   t3  3.25
    9   Line    Below   t1  2.43
    10  Line    Below   t1  7.51
    11  Line    Below   t3  2.15
    11  Line    Below   t1  7.47
    12  Line    Below   t1  1.56
    12  Line    Below   t3  6.03
    13  Line    Below   t1  4.98', header = TRUE)
    
    df %>% 
      group_by(Factor, Time) %>% 
      summarise(median= median(Score),
                first = quantile(Score, 0.25),
                third = quantile(Score, 0.75),
                average = mean(Score[Level == 'Below']),
                perc = sum(Level == 'Below')*100/n(),
                .groups = 'drop') %>% 
      pivot_wider(id_cols = Factor, names_from = Time, values_from = median:perc, names_vary = 'slowest')
    #> # A tibble: 2 x 16
    #>   Factor median_t1 first_t1 third_t1 average_t1 perc_t1 median_t2 first_t2
    #>   <chr>      <dbl>    <dbl>    <dbl>      <dbl>   <dbl>     <dbl>    <dbl>
    #> 1 Curve       12.3    11.5      14.4       5.28    20        11.7     6.85
    #> 2 Line        10.3     6.85     16.0       4.79    41.7      13.2    12.6 
    #> # ... with 8 more variables: third_t2 <dbl>, average_t2 <dbl>, perc_t2 <dbl>,
    #> #   median_t3 <dbl>, first_t3 <dbl>, third_t3 <dbl>, average_t3 <dbl>,
    #> #   perc_t3 <dbl>
    

    Created on 2022-04-08 by the reprex package (v2.0.1)