I have a tibble where each patient can be observed several times. So names are like this : id_patient (num) ; id_eval (num) ; treat_1 (logical) ; treat_2 (logical) ; treat_1_type (char) ; treat_2_type (char).
What I want : a summary table (with tbl_summary) describing unique values to know how many patients have, at least 1 time, be concerned by a possibility. Something like this :
var | All patients (n=N) |
---|---|
treat_1 | AA (aa %) |
treat_2 | BB (bb %) |
treat_1_type | |
- Type_1 | CC (cc %) |
- Type_2 | DD (dd %) |
treat_2_type | |
- Type_1 | EE (ee %) |
- Type_2 | FF (ff %) |
- Type_3 | GG (gg %) |
What I have for now is :
evals %>%
group_by(id_patient) %>%
select(id_patient, treat_1, treat_2) %>%
summarise(across(everything(), .fns = unique))
summary()
But that gives me all existing TRUE/FALSE combinations, so it does not represent really unique values. And this is for the logical part so the easy one, it will not work with factors...
How do you think I can achieve that ?
I wish you had given us a bit of data. But let's produce them ourselves.
library(tidyverse)
n=10
evals = tibble(
id_patient = sample(1:50, n, replace = T),
id_eval = sample(120:277, n),
treat_1 = sample(c(T, F), n, replace = T),
treat_2 = sample(c(T, F), n, replace = T),
treat_1_type = sample(c("Type_1", "Type_2"), n, replace = T),
treat_2_type = sample(c("Type_1", "Type_2", "Type_3"), n, replace = T)
)
evals
output
# A tibble: 10 x 6
id_patient id_eval treat_1 treat_2 treat_1_type treat_2_type
<int> <int> <lgl> <lgl> <fct> <fct>
1 42 237 TRUE FALSE Type_2 Type_3
2 24 240 FALSE FALSE Type_1 Type_1
3 10 236 TRUE FALSE Type_1 Type_3
4 27 153 TRUE FALSE Type_1 Type_2
5 29 126 TRUE FALSE Type_2 Type_1
6 18 194 FALSE TRUE Type_1 Type_2
7 18 215 TRUE FALSE Type_2 Type_2
8 48 205 TRUE FALSE Type_1 Type_3
9 12 131 FALSE FALSE Type_1 Type_2
10 13 225 FALSE FALSE Type_2 Type_3
Is it okay? I hope so. Now let's do a summary as you like.
seval = evals %>%
group_by(id_patient) %>%
summarise(
treat_1 = sum(treat_1)>0,
treat_2 = sum(treat_2)>0,
treat_1_Type_1 = sum(treat_1_type=="Type_1")>0,
treat_1_Type_2 = sum(treat_1_type=="Type_2")>0,
treat_2_Type_1 = sum(treat_2_type=="Type_1")>0,
treat_2_Type_2 = sum(treat_2_type=="Type_2")>0,
treat_2_Type_3 = sum(treat_2_type=="Type_3")>0
) %>% summarise(
treat_1 = sum(treat_1),
treat_2 = sum(treat_2),
treat_1_Type_1 = sum(treat_1_Type_1),
treat_1_Type_2 = sum(treat_1_Type_2),
treat_2_Type_1 = sum(treat_2_Type_1),
treat_2_Type_2 = sum(treat_2_Type_2),
treat_2_Type_3 = sum(treat_2_Type_3)
)
output
# A tibble: 1 x 7
treat_1 treat_2 treat_1_Type_1 treat_1_Type_2 treat_2_Type_1 treat_2_Type_2 treat_2_Type_3
<int> <int> <int> <int> <int> <int> <int>
1 6 1 6 4 2 4 4
Now you can easily calculate the proportions
seval %>%
pivot_longer(everything(), names_to = "var", values_to = "val") %>%
group_by(var) %>%
mutate(prop = val/length(unique(evals$id_patient)))
output
# A tibble: 7 x 3
# Groups: var [7]
var val prop
<chr> <int> <dbl>
1 treat_1 6 0.667
2 treat_2 1 0.111
3 treat_1_Type_1 6 0.667
4 treat_1_Type_2 4 0.444
5 treat_2_Type_1 2 0.222
6 treat_2_Type_2 4 0.444
7 treat_2_Type_3 4 0.444
I tested everything for both chr
and factor
variables and everything works fine.