I have a dataset with patient data. Some of the patients have multiple stays in the hospital, but for the observations to be independent (I tried a multi-level model, but it's not possible with this data) I created a dataset, where for each patient that has multiple stays only one stay is selected randomly. Like this, I created 100 datasets with random stays for these patients. My dependent variable is a count variable and a zero-inflated negative binomial model fits best.
I already managed to run the regression model on each of the datasets (the datasets are identified by the variable "sample"), but I don't know how to get a pooled result for all of these 100 regressions. I would like to get the pooled results of the count model and of the zero-inflated model for every predictor.
I'm using:
library(dplyr); library(tidyverse); library(pscl); library(broom); library(jtools); library(mice)
The pool
function is from mice
.
I created the combined dataset like this:
set.seed(12345)
Combined_randcase <- bind_rows(replicate(100, cohort_1 %>% group_by(patient) %>%
slice_sample(n=1, replace = TRUE), simplify=F), .id="sample")
Combined_randcase <- data.frame(as.list(Combined_randcase))
I ran the ZINB regression model on each dataset, grouped by "sample", like this (using broom
package):
regr_comb_randcase.zeroinfl = Combined_randcase %>%
nest_by(sample) %>%
mutate(model = list(zeroinfl(formula = cm_number ~ after_wm + age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication | age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = cohort_1, na.action = na.exclude, dist = "negbin")))
%>%
summarise(tidy(model))
That's how I tried to get pooled results:
models.zeroinfl <- regr_comb_randcase.zeroinfl$model
pool_results.zeroinfl <- pool(regr_comb_randcase.zeroinfl)
When running the second line, I get this error:
Error: No tidy method for objects of class character
For another logistic regression model, I did this successfully:
regr_comb_randcase.log = Combined_randcase %>%
group_by(sample) %>%
do(model = glm(cm ~ after_wm + age + gender_male +ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = ., family = binomial()))
models <- regr_comb_randcase.log$model
pool_results <- pool(models)
summary(pool_results)
Output of dput(cohort_1_example)
(a shortened version of my dataset) for reproducibility:
structure(list(case = c("20001879", "20009253", "20003748", "20002321",
"20001662", "1910967", "20008058", "20010686", "20010938", "20009508",
"20002307", "20010105", "210098181", "21009818", "210100261",
"21010026", "21000865", "21002199", "1906803", "1907642", "20008274",
"21000858", "21004557", "1910669", "21004451", "21000202", "21000812",
"21001006", "21001143", "21001423", "1906820", "21000448", "21002128",
"21002666", "21003560", "1907070", "20011121", "1907614", "20002748",
"20010645", "21001363", "1908906", "1910981", "1905926", "21002429",
"21004264", "20011209", "20010442", "20009977", "1906382", "1909409",
"1908904", "1910516", "20001534", "20011201", "1907432", "1908332",
"1906356", "20011026", "20008206", "20000809", "1910664", "1908673",
"1907610", "1911046", "20008505", "20009385", "21000530", "1909620",
"1909730", "1910988", "20009899", "1907282", "1906671", "20007870",
"1910749", "20010782", "20009808", "20003311", "1910722", "1910529",
"1906638", "1906861", "1906956", "1910743", "20002057", "21000891",
"20010349", "20008503", "1906093", "1910662", "20008093", "20010683",
"20008787", "20003631", "20007796", "20008089", "21004141", "20010177",
"20001316", "1909809", "20001875", "20009552", "20001443", "21000419",
"20003106", "1909773", "21004600", "20008105", "21002070", "1908245",
"1909860", "21004209", "21003022", "20003151", "20011037", "21001966",
"20009902", "1906202", "1910009", "1910777", "20010294", "1910072"
), patient = c("10", "11", "100", "100", "101", "102", "103",
"105", "106", "107", "108", "11", "11", "11", "11", "11", "1000",
"1001", "1002", "1002", "1003", "1003", "1004", "1005", "1005",
"1006", "1008", "1009", "1009", "1009", "1011", "1012", "1013",
"1013", "1013", "1014", "1016", "1017", "1018", "1020", "1020",
"1021", "1022", "1023", "1026", "1026", "1029", "1030", "1033",
"1035", "1036", "1037", "1037", "1037", "1039", "1041", "1041",
"1042", "1042", "1043", "1044", "1045", "1046", "1047", "1048",
"1049", "1050", "1053", "1054", "1056", "1056", "1057", "1058",
"1060", "1061", "1064", "1064", "1064", "1065", "1066", "1067",
"1067", "1067", "1067", "1069", "1071", "1072", "1073", "1074",
"1075", "1075", "1076", "1077", "1078", "1079", "1080", "1081",
"1082", "1083", "1086", "1087", "1087", "1088", "1089", "1089",
"1090", "1091", "1091", "1092", "1093", "1094", "1094", "1095",
"1096", "1098", "1098", "1098", "1099", "1048", "1048", "1021",
"1018", "1011"), cm = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0), cm_number = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 5, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 2, 2, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0), total_cm_duration = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 165.000000003492, 0, 0, 0, 0, 0, 0,
0, 0, 174.999999994179, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 259.999999998836,
720, 0, 0, 0, 0, 0, 60.0000000069849, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 815.000000005821, 0, 0, 10865.0000000023,
0, 0, 0, 0, 0, 0, 420.000000006985, 0, 0, 0, 0, 0, 200.000000002328,
0, 0, 0, 0, 0, 0, 239.999999996508, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1084.99999999534, 0, 0, 145.000000001164, 0, 0, 789.999999997672,
435.000000003492, 0, 0, 60.0000000069849, 0, 0, 0, 0, 775.000000001164,
0, 0, 0, 0, 0, 0, 0), after_wm = c(0, 1, 0, 0, 0, 0, 1, 1, 1,
1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 1, 0), age = c(26, 27, 53, 53, 26, 28, 30,
57, 39, 50, 49, 27, 28, 28, 27, 27, 89, 18, 22, 22, 21, 21, 58,
35, 36, 63, 44, 35, 35, 35, 25, 24, 36, 36, 36, 62, 50, 21, 55,
23, 23, 44, 53, 71, 39, 39, 79, 47, 81, 43, 39, 21, 22, 22, 79,
22, 22, 33, 35, 86, 27, 42, 20, 30, 25, 22, 26, 62, 54, 46, 46,
46, 79, 39, 21, 63, 64, 64, 31, 59, 70, 70, 70, 70, 49, 37, 49,
63, 74, 38, 39, 74, 50, 72, 61, 80, 51, 45, 67, 45, 76, 76, 61,
30, 31, 35, 48, 49, 45, 30, 76, 76, 20, 18, 20, 20, 21, 51, 24,
24, 45, 55, 25), gender_male = c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 0, 0), ref_mode_police = c(0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), ref_lg_invol = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0), ref_reas_selfharm = c(1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,
1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0,
1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0), ref_reas_aggrpers = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), comm_limited = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), duration_days = c(41,
1, 2, 42, 3, 46, 3, 8, 1, 1, 21, 64, 25, 2, 25, 2, 22, 26, 7,
29, 101, 119, 153, 2, 2, 51, 10, 2, 6, 49, 83, 5, 1, 8, 1, 36,
71, 1, 7, 9, 166, 41, 2, 76, 12, 1, 25, 40, 4, 0, 2, 28, 1, 3,
49, 29, 54, 95, 119, 29, 28, 26, 43, 1, 15, 121, 22, 28, 73,
13, 39, 1, 119, 14, 73, 18, 124, 32, 2, 120, 67, 2, 2, 8, 29,
27, 34, 32, 112, 6, 8, 38, 118, 24, 38, 20, 2, 1, 2, 9, 1, 21,
42, 57, 49, 1, 1, 1, 35, 2, 45, 23, 64, 29, 2, 6, 56, 0, 5, 3,
58, 51, 2), diagnosis_personality = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_psychosis = c(1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0,
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1), diagnosis_mania = c(0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_substance = c(0,
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1), intoxication = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1)), row.names = c(NA,
-123L), class = c("tbl_df", "tbl", "data.frame"))
I could find information on doing this with a linear and a logistic model, but not on a zero-inflated negbin model. Maybe that's why tidy is not working? Any help is much appreciated.
We can fit mixed/multilevel models to data with singleton groups; the main constraint is whether there is enough information overall to make estimating the among-group variances feasible.
Among mainstream packages, lme4
can fit LMMs, logistic GLMMs, and negative binomial mixed models (albeit a bit slowly, and not zero-inflated mixed models without a lot of work: see e.g. here). glmmTMB
can handle all of the above. brms
can do anything glmmTMB
can do, plus the kitchen sink, but is slower (because Bayesian MCMC) and may need you to get into the weeds of Bayesian/MCMC sampling.
The examples below use your data subset, with considerably simplified models. Your sample data has 86 patients, 61 singletons, 123 total observations. This seems to be nearly enough to fit the simplified models (below); we do run into some of the typical not-quite-enough-data problems with the ZINB fit (zero-inflation term converges to zero probability, NB dispersion parameter converges to a Poisson distribution) and the logistic fit (singular fit, i.e. the among-patient variance converges to zero). These problems are much less likely to occur if your full data set is large ...
The first bit of machinery (besides loading packages) is cosmetic, to avoid too much repetitive code and make it easier to see what common set of predictors is included across all models.
library(lme4)
library(glmmTMB)
## list of all of the fixed effect predictors
fix_vars <- c("after_wm", "age", "gender_male", "ref_mode_police")
## 'resp' is the name of the response variable (character)
ff <- function(resp) {
reformulate(c(fix_vars, "(1|patient)"), resp = resp)
}
## Gaussian/LMM
lme4::lmer(ff("total_cm_duration"), data = dd)
## NB/no zero-inflation
glmmTMB::glmmTMB(ff("cm_number"),
family = nbinom2,
data = dd)
## NB with (simple) zero-inflation
glmmTMB::glmmTMB(ff("cm_number"),
family = nbinom2,
## could use zi = ff(NULL) to include all FE predictors
## as ZI predictors as well ...
zi = ~1,
data = dd)
## logistic
lme4::glmer(ff("cm"),
family = binomial,
data = dd)