I am trying to create a function that cross-tabulates whether the missing/absent values in two variables overlap.
The function takes two variables and the dataset. It looks like this:
absent_2by2 <- function(var1, var2, data){
require(tidyverse)
require(gtsummary)
require(data.table)
data %>%
as.data.table() %>%
mutate(var1_c = 0) %>%
.[!is.na(var1), var1_c := 1] %>%
.[is.na(var1), var1_c := 2] %>%
mutate(var1_c = as.factor(var1_c),
var1_c = fct_recode(var1_c,
"Present" = "1",
"Absent" = "2")
) %>%
mutate(var2_c = 0) %>%
.[!is.na(var2), var2_c := 1] %>%
.[is.na(var2), var2_c := 2] %>%
mutate(var2_c = as.factor(var2_c),
var2_c = fct_recode(var2_c,
"Present" = "1",
"Absent" = "2")
) %>%
gtsummary::tbl_cross(data,
var2_c, var1_c,
percent = "no")
}
When I call the function using the following code:
absent_2by2("Ozone", "Solar.R", airquality)
...the output looks like this:
...this is the error I get:
Warning messages:
1: Problem with `mutate()` column `var1_c`.
ℹ `var1_c = fct_recode(var1_c, Present = "1", Absent = "2")`.
ℹ Unknown levels in `f`: 2
2: Problem with `mutate()` column `var2_c`.
ℹ `var2_c = fct_recode(var2_c, Present = "1", Absent = "2")`.
ℹ Unknown levels in `f`: 2
It seems like the function is not picking up level 2 of both of my variables. Not sure why this is, as when I string together the code as one pipe, I get the correct output. The standalone code looks like this:
require(tidyverse)
require(gtsummary)
require(data.table)
airquality %>%
as.data.table() %>%
mutate(var1_c = 0) %>%
.[!is.na(Ozone), var1_c := 1] %>%
.[is.na(Ozone), var1_c := 2] %>%
mutate(var1_c = as.factor(var1_c),
var1_c = fct_recode(var1_c,
"Present" = "1",
"Absent" = "2")
) %>%
mutate(var2_c = 0) %>%
.[!is.na(Solar.R), var2_c := 1] %>%
.[is.na(Solar.R), var2_c := 2] %>%
mutate(var2_c = as.factor(var2_c),
var2_c = fct_recode(var2_c,
"Present" = "1",
"Absent" = "2")
) %>%
gtsummary::tbl_cross(.,
var2_c, var1_c,
percent = "no"
)
Output looks like this:
I would appreciate it if anyone can guide me on this. Thank you!
I think this should work for you.
absent_2by2 <- function(data, var1, var2) {
# make var1 and var2 binary factors factors for NA values
data <-
dplyr::mutate(
data,
dplyr::across(
.cols = all_of(c(var1, var2)),
.fns = ~factor(is.na(.),
levels = c(FALSE, TRUE),
labels = c("Present", "Absent"))
)
)
# cross tabulate missing values
gtsummary::tbl_cross(data, row = all_of(var1), col = all_of(var2))
}
absent_2by2(gtsummary::trial, "age", "trt")