Search code examples
rtidyverseunique

How to describe unique values of grouped observations for several vars?


I have a tibble where each patient can be observed several times. So names are like this : id_patient (num) ; id_eval (num) ; treat_1 (logical) ; treat_2 (logical) ; treat_1_type (char) ; treat_2_type (char).

What I want : a summary table (with tbl_summary) describing unique values to know how many patients have, at least 1 time, be concerned by a possibility. Something like this :

var All patients (n=N)
treat_1 AA (aa %)
treat_2 BB (bb %)
treat_1_type
- Type_1 CC (cc %)
- Type_2 DD (dd %)
treat_2_type
- Type_1 EE (ee %)
- Type_2 FF (ff %)
- Type_3 GG (gg %)

What I have for now is :

evals %>%
    group_by(id_patient) %>%
    select(id_patient, treat_1, treat_2) %>%
    summarise(across(everything(), .fns = unique))
    summary()

But that gives me all existing TRUE/FALSE combinations, so it does not represent really unique values. And this is for the logical part so the easy one, it will not work with factors...

How do you think I can achieve that ?


Solution

  • I wish you had given us a bit of data. But let's produce them ourselves.

    library(tidyverse)
    
    n=10
    evals = tibble(
      id_patient = sample(1:50, n, replace = T),
      id_eval = sample(120:277, n),
      treat_1 = sample(c(T, F), n, replace = T),
      treat_2 = sample(c(T, F), n, replace = T),
      treat_1_type = sample(c("Type_1", "Type_2"), n, replace = T),
      treat_2_type = sample(c("Type_1", "Type_2", "Type_3"), n, replace = T)
    )
    
    evals
    

    output

    # A tibble: 10 x 6
       id_patient id_eval treat_1 treat_2 treat_1_type treat_2_type
            <int>   <int> <lgl>   <lgl>   <fct>        <fct>       
     1         42     237 TRUE    FALSE   Type_2       Type_3      
     2         24     240 FALSE   FALSE   Type_1       Type_1      
     3         10     236 TRUE    FALSE   Type_1       Type_3      
     4         27     153 TRUE    FALSE   Type_1       Type_2      
     5         29     126 TRUE    FALSE   Type_2       Type_1      
     6         18     194 FALSE   TRUE    Type_1       Type_2      
     7         18     215 TRUE    FALSE   Type_2       Type_2      
     8         48     205 TRUE    FALSE   Type_1       Type_3      
     9         12     131 FALSE   FALSE   Type_1       Type_2      
    10         13     225 FALSE   FALSE   Type_2       Type_3         
    

    Is it okay? I hope so. Now let's do a summary as you like.

    seval = evals %>%
      group_by(id_patient) %>%
      summarise(
        treat_1 = sum(treat_1)>0,
        treat_2 = sum(treat_2)>0,
        treat_1_Type_1 = sum(treat_1_type=="Type_1")>0,
        treat_1_Type_2 = sum(treat_1_type=="Type_2")>0,
        treat_2_Type_1 = sum(treat_2_type=="Type_1")>0,
        treat_2_Type_2 = sum(treat_2_type=="Type_2")>0,
        treat_2_Type_3 = sum(treat_2_type=="Type_3")>0
      ) %>% summarise(
        treat_1 = sum(treat_1),
        treat_2 = sum(treat_2),
        treat_1_Type_1 = sum(treat_1_Type_1),
        treat_1_Type_2 = sum(treat_1_Type_2),
        treat_2_Type_1 = sum(treat_2_Type_1),
        treat_2_Type_2 = sum(treat_2_Type_2),
        treat_2_Type_3 = sum(treat_2_Type_3)
      )
    
    
    

    output

    # A tibble: 1 x 7
      treat_1 treat_2 treat_1_Type_1 treat_1_Type_2 treat_2_Type_1 treat_2_Type_2 treat_2_Type_3
        <int>   <int>          <int>          <int>          <int>          <int>          <int>
    1       6       1              6              4              2              4              4
    

    Now you can easily calculate the proportions

    seval %>% 
      pivot_longer(everything(), names_to = "var", values_to = "val") %>% 
      group_by(var) %>% 
      mutate(prop = val/length(unique(evals$id_patient)))
    

    output

    # A tibble: 7 x 3
    # Groups:   var [7]
      var              val  prop
      <chr>          <int> <dbl>
    1 treat_1            6 0.667
    2 treat_2            1 0.111
    3 treat_1_Type_1     6 0.667
    4 treat_1_Type_2     4 0.444
    5 treat_2_Type_1     2 0.222
    6 treat_2_Type_2     4 0.444
    7 treat_2_Type_3     4 0.444  
    

    I tested everything for both chr and factor variables and everything works fine.