Search code examples
rfunctiondplyrgtsummary

Function not picking up level that detects NA observations


I am trying to create a function that cross-tabulates whether the missing/absent values in two variables overlap.

The function takes two variables and the dataset. It looks like this:

absent_2by2 <- function(var1, var2, data){
  
  require(tidyverse)
  require(gtsummary)
  require(data.table)
  
  
  data %>% 
    as.data.table() %>% 
    mutate(var1_c = 0) %>% 
    .[!is.na(var1), var1_c := 1] %>% 
    .[is.na(var1), var1_c := 2] %>%
    mutate(var1_c = as.factor(var1_c),
           var1_c = fct_recode(var1_c,
                               "Present" = "1",
                               "Absent" = "2")
           ) %>% 
    mutate(var2_c = 0) %>% 
    .[!is.na(var2), var2_c := 1] %>% 
    .[is.na(var2), var2_c := 2] %>% 
    mutate(var2_c = as.factor(var2_c),
           var2_c = fct_recode(var2_c,
                               "Present" = "1",
                               "Absent" = "2")
           ) %>% 
           gtsummary::tbl_cross(data, 
                                 var2_c, var1_c,
                                 percent = "no")
  }

When I call the function using the following code:

absent_2by2("Ozone", "Solar.R", airquality)

...the output looks like this:

enter image description here

...this is the error I get:

Warning messages:
1: Problem with `mutate()` column `var1_c`.
ℹ `var1_c = fct_recode(var1_c, Present = "1", Absent = "2")`.
ℹ Unknown levels in `f`: 2 
2: Problem with `mutate()` column `var2_c`.
ℹ `var2_c = fct_recode(var2_c, Present = "1", Absent = "2")`.
ℹ Unknown levels in `f`: 2 

It seems like the function is not picking up level 2 of both of my variables. Not sure why this is, as when I string together the code as one pipe, I get the correct output. The standalone code looks like this:

 require(tidyverse)
 require(gtsummary) 
 require(data.table)

    airquality %>% 
      as.data.table() %>% 
      mutate(var1_c = 0) %>% 
      .[!is.na(Ozone), var1_c := 1] %>% 
      .[is.na(Ozone), var1_c := 2] %>%
      mutate(var1_c = as.factor(var1_c),
             var1_c = fct_recode(var1_c,
                                 "Present" = "1",
                                 "Absent" = "2")
      ) %>% 
      mutate(var2_c = 0) %>% 
      .[!is.na(Solar.R), var2_c := 1] %>% 
      .[is.na(Solar.R), var2_c := 2] %>% 
      mutate(var2_c = as.factor(var2_c),
             var2_c = fct_recode(var2_c,
                                 "Present" = "1",
                                 "Absent" = "2")
      ) %>%  
      gtsummary::tbl_cross(., 
                                   var2_c, var1_c,
                                   percent = "no"
    )

Output looks like this:

enter image description here

I would appreciate it if anyone can guide me on this. Thank you!


Solution

  • I think this should work for you.

    absent_2by2 <- function(data, var1, var2) {
      # make var1 and var2 binary factors factors for NA values
      data <-
        dplyr::mutate(
          data,
          dplyr::across(
            .cols = all_of(c(var1, var2)),
            .fns = ~factor(is.na(.), 
                           levels = c(FALSE, TRUE), 
                           labels = c("Present", "Absent"))
          )
        )
      
      # cross tabulate missing values
      gtsummary::tbl_cross(data, row = all_of(var1), col = all_of(var2))
    }
    
    absent_2by2(gtsummary::trial, "age", "trt")
    

    enter image description here