I have this long dataset with one factor variable and many levels, for simplicity I am just using two here(Curve, Line).
The values(score) can be Above(Above threshold) or Below (Below threshold) and it is measured at three different time points. Not all IDs are measured at all 3 time points. Some are measured during only t1, some t1, t2, some t1,t3, some all time points, t1,t2,t3.
Id Factor Level Time Score
1 Curve Above t1 16.11
1 Curve Above t2 15.67
1 Curve Above t3 11.24
2 Curve Above t1 17.93
2 Curve Above t2 11.82
2 Curve Above t3 12.95
3 Curve Above t1 12.68
4 Curve Above t1 11.53
4 Curve Above t2 11.74
4 Curve Above t3 14.40
5 Curve Above t1 14.48
5 Curve Above t3 17.32
6 Curve Above t1 11.61
6 Curve Above t2 14.96
7 Curve Above t1 14.00
7 Curve Above t2 10.02
7 Curve Above t3 14.52
8 Curve Above t1 11.85
8 Curve Below t3 3.26
9 Curve Below t1 2.49
9 Curve Below t3 7.00
10 Curve Below t2 3.68
10 Curve Below t3 1.62
11 Curve Below t1 8.08
11 Curve Below t2 1.59
11 Curve Below t3 1.59
1 Line Above t1 10.20
1 Line Above t2 13.20
1 Line Above t3 15.85
2 Line Above t1 19.80
2 Line Above t2 11.99
3 Line Above t3 17.32
3 Line Above t1 10.43
4 Line Above t1 12.34
4 Line Above t2 14.25
5 Line Above t3 14.72
5 Line Above t1 15.02
6 Line Above t3 17.94
6 Line Above t1 19.65
7 Line Above t1 18.75
7 Line Below t3 3.25
9 Line Below t1 2.43
10 Line Below t1 7.51
11 Line Below t3 2.15
11 Line Below t1 7.47
12 Line Below t1 1.56
12 Line Below t3 6.03
13 Line Below t1 4.98
What I am trying to do is create a summary like this.
I can do this manually or in excel but the fact that I have more than 20 levels in my actual dataset makes this process tedious and inefficient. I am looking for help inorder to make this process more efficient and faster in r. Appreciate your time and assistance. Thanks in advance.
I think you want this?
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3
df <- read.table(text = 'Id Factor Level Time Score
1 Curve Above t1 16.11
1 Curve Above t2 15.67
1 Curve Above t3 11.24
2 Curve Above t1 17.93
2 Curve Above t2 11.82
2 Curve Above t3 12.95
3 Curve Above t1 12.68
4 Curve Above t1 11.53
4 Curve Above t2 11.74
4 Curve Above t3 14.40
5 Curve Above t1 14.48
5 Curve Above t3 17.32
6 Curve Above t1 11.61
6 Curve Above t2 14.96
7 Curve Above t1 14.00
7 Curve Above t2 10.02
7 Curve Above t3 14.52
8 Curve Above t1 11.85
8 Curve Below t3 3.26
9 Curve Below t1 2.49
9 Curve Below t3 7.00
10 Curve Below t2 3.68
10 Curve Below t3 1.62
11 Curve Below t1 8.08
11 Curve Below t2 1.59
11 Curve Below t3 1.59
1 Line Above t1 10.20
1 Line Above t2 13.20
1 Line Above t3 15.85
2 Line Above t1 19.80
2 Line Above t2 11.99
3 Line Above t3 17.32
3 Line Above t1 10.43
4 Line Above t1 12.34
4 Line Above t2 14.25
5 Line Above t3 14.72
5 Line Above t1 15.02
6 Line Above t3 17.94
6 Line Above t1 19.65
7 Line Above t1 18.75
7 Line Below t3 3.25
9 Line Below t1 2.43
10 Line Below t1 7.51
11 Line Below t3 2.15
11 Line Below t1 7.47
12 Line Below t1 1.56
12 Line Below t3 6.03
13 Line Below t1 4.98', header = TRUE)
df %>%
group_by(Factor, Time) %>%
summarise(median= median(Score),
first = quantile(Score, 0.25),
third = quantile(Score, 0.75),
average = mean(Score[Level == 'Below']),
perc = sum(Level == 'Below')*100/n(),
.groups = 'drop') %>%
pivot_wider(id_cols = Factor, names_from = Time, values_from = median:perc, names_vary = 'slowest')
#> # A tibble: 2 x 16
#> Factor median_t1 first_t1 third_t1 average_t1 perc_t1 median_t2 first_t2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Curve 12.3 11.5 14.4 5.28 20 11.7 6.85
#> 2 Line 10.3 6.85 16.0 4.79 41.7 13.2 12.6
#> # ... with 8 more variables: third_t2 <dbl>, average_t2 <dbl>, perc_t2 <dbl>,
#> # median_t3 <dbl>, first_t3 <dbl>, third_t3 <dbl>, average_t3 <dbl>,
#> # perc_t3 <dbl>
Created on 2022-04-08 by the reprex package (v2.0.1)