Search code examples
rcountregressionpool

Fitting several zero-inflated negbin models and get pooled results


I have a dataset with patient data. Some of the patients have multiple stays in the hospital, but for the observations to be independent (I tried a multi-level model, but it's not possible with this data) I created a dataset, where for each patient that has multiple stays only one stay is selected randomly. Like this, I created 100 datasets with random stays for these patients. My dependent variable is a count variable and a zero-inflated negative binomial model fits best. I already managed to run the regression model on each of the datasets (the datasets are identified by the variable "sample"), but I don't know how to get a pooled result for all of these 100 regressions. I would like to get the pooled results of the count model and of the zero-inflated model for every predictor. I'm using: library(dplyr); library(tidyverse); library(pscl); library(broom); library(jtools); library(mice)

The pool function is from mice.

I created the combined dataset like this:

set.seed(12345)

Combined_randcase <- bind_rows(replicate(100, cohort_1 %>% group_by(patient) %>%
                                  slice_sample(n=1, replace = TRUE), simplify=F), .id="sample")

Combined_randcase <- data.frame(as.list(Combined_randcase))

I ran the ZINB regression model on each dataset, grouped by "sample", like this (using broom package):

regr_comb_randcase.zeroinfl = Combined_randcase %>% 
nest_by(sample) %>% 
mutate(model = list(zeroinfl(formula = cm_number ~ after_wm + age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication | age + gender_male + ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = cohort_1, na.action = na.exclude, dist = "negbin"))) 
%>%
  summarise(tidy(model)) 

That's how I tried to get pooled results:

models.zeroinfl <- regr_comb_randcase.zeroinfl$model

pool_results.zeroinfl <- pool(regr_comb_randcase.zeroinfl)

When running the second line, I get this error:

Error: No tidy method for objects of class character

For another logistic regression model, I did this successfully:

regr_comb_randcase.log = Combined_randcase %>% 
group_by(sample) %>% 
do(model = glm(cm ~ after_wm + age + gender_male +ref_mode_police + ref_lg_invol + ref_reas_selfharm + ref_reas_aggrpers + comm_limited + duration_days + diagnosis_personality + diagnosis_psychosis + diagnosis_mania + diagnosis_substance + intoxication, data = ., family = binomial()))

models <- regr_comb_randcase.log$model

pool_results <- pool(models)

summary(pool_results)

Output of dput(cohort_1_example) (a shortened version of my dataset) for reproducibility:

structure(list(case = c("20001879", "20009253", "20003748", "20002321", 
"20001662", "1910967", "20008058", "20010686", "20010938", "20009508", 
"20002307", "20010105", "210098181", "21009818", "210100261", 
"21010026", "21000865", "21002199", "1906803", "1907642", "20008274", 
"21000858", "21004557", "1910669", "21004451", "21000202", "21000812", 
"21001006", "21001143", "21001423", "1906820", "21000448", "21002128", 
"21002666", "21003560", "1907070", "20011121", "1907614", "20002748", 
"20010645", "21001363", "1908906", "1910981", "1905926", "21002429", 
"21004264", "20011209", "20010442", "20009977", "1906382", "1909409", 
"1908904", "1910516", "20001534", "20011201", "1907432", "1908332", 
"1906356", "20011026", "20008206", "20000809", "1910664", "1908673", 
"1907610", "1911046", "20008505", "20009385", "21000530", "1909620", 
"1909730", "1910988", "20009899", "1907282", "1906671", "20007870", 
"1910749", "20010782", "20009808", "20003311", "1910722", "1910529", 
"1906638", "1906861", "1906956", "1910743", "20002057", "21000891", 
"20010349", "20008503", "1906093", "1910662", "20008093", "20010683", 
"20008787", "20003631", "20007796", "20008089", "21004141", "20010177", 
"20001316", "1909809", "20001875", "20009552", "20001443", "21000419", 
"20003106", "1909773", "21004600", "20008105", "21002070", "1908245", 
"1909860", "21004209", "21003022", "20003151", "20011037", "21001966", 
"20009902", "1906202", "1910009", "1910777", "20010294", "1910072"
), patient = c("10", "11", "100", "100", "101", "102", "103", 
"105", "106", "107", "108", "11", "11", "11", "11", "11", "1000", 
"1001", "1002", "1002", "1003", "1003", "1004", "1005", "1005", 
"1006", "1008", "1009", "1009", "1009", "1011", "1012", "1013", 
"1013", "1013", "1014", "1016", "1017", "1018", "1020", "1020", 
"1021", "1022", "1023", "1026", "1026", "1029", "1030", "1033", 
"1035", "1036", "1037", "1037", "1037", "1039", "1041", "1041", 
"1042", "1042", "1043", "1044", "1045", "1046", "1047", "1048", 
"1049", "1050", "1053", "1054", "1056", "1056", "1057", "1058", 
"1060", "1061", "1064", "1064", "1064", "1065", "1066", "1067", 
"1067", "1067", "1067", "1069", "1071", "1072", "1073", "1074", 
"1075", "1075", "1076", "1077", "1078", "1079", "1080", "1081", 
"1082", "1083", "1086", "1087", "1087", "1088", "1089", "1089", 
"1090", "1091", "1091", "1092", "1093", "1094", "1094", "1095", 
"1096", "1098", "1098", "1098", "1099", "1048", "1048", "1021", 
"1018", "1011"), cm = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0), cm_number = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 5, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 2, 2, 0, 0, 1, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0), total_cm_duration = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 165.000000003492, 0, 0, 0, 0, 0, 0, 
0, 0, 174.999999994179, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 259.999999998836, 
720, 0, 0, 0, 0, 0, 60.0000000069849, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 815.000000005821, 0, 0, 10865.0000000023, 
0, 0, 0, 0, 0, 0, 420.000000006985, 0, 0, 0, 0, 0, 200.000000002328, 
0, 0, 0, 0, 0, 0, 239.999999996508, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1084.99999999534, 0, 0, 145.000000001164, 0, 0, 789.999999997672, 
435.000000003492, 0, 0, 60.0000000069849, 0, 0, 0, 0, 775.000000001164, 
0, 0, 0, 0, 0, 0, 0), after_wm = c(0, 1, 0, 0, 0, 0, 1, 1, 1, 
1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 
0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 
0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
0, 1, 1, 1, 0, 0, 0, 1, 0), age = c(26, 27, 53, 53, 26, 28, 30, 
57, 39, 50, 49, 27, 28, 28, 27, 27, 89, 18, 22, 22, 21, 21, 58, 
35, 36, 63, 44, 35, 35, 35, 25, 24, 36, 36, 36, 62, 50, 21, 55, 
23, 23, 44, 53, 71, 39, 39, 79, 47, 81, 43, 39, 21, 22, 22, 79, 
22, 22, 33, 35, 86, 27, 42, 20, 30, 25, 22, 26, 62, 54, 46, 46, 
46, 79, 39, 21, 63, 64, 64, 31, 59, 70, 70, 70, 70, 49, 37, 49, 
63, 74, 38, 39, 74, 50, 72, 61, 80, 51, 45, 67, 45, 76, 76, 61, 
30, 31, 35, 48, 49, 45, 30, 76, 76, 20, 18, 20, 20, 21, 51, 24, 
24, 45, 55, 25), gender_male = c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 1, 0, 0, 0, 0, 0), ref_mode_police = c(0, 0, 0, 0, 0, 0, 
1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), ref_lg_invol = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0), ref_reas_selfharm = c(1, 
1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 
1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 
1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 
0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0), ref_reas_aggrpers = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), comm_limited = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), duration_days = c(41, 
1, 2, 42, 3, 46, 3, 8, 1, 1, 21, 64, 25, 2, 25, 2, 22, 26, 7, 
29, 101, 119, 153, 2, 2, 51, 10, 2, 6, 49, 83, 5, 1, 8, 1, 36, 
71, 1, 7, 9, 166, 41, 2, 76, 12, 1, 25, 40, 4, 0, 2, 28, 1, 3, 
49, 29, 54, 95, 119, 29, 28, 26, 43, 1, 15, 121, 22, 28, 73, 
13, 39, 1, 119, 14, 73, 18, 124, 32, 2, 120, 67, 2, 2, 8, 29, 
27, 34, 32, 112, 6, 8, 38, 118, 24, 38, 20, 2, 1, 2, 9, 1, 21, 
42, 57, 49, 1, 1, 1, 35, 2, 45, 23, 64, 29, 2, 6, 56, 0, 5, 3, 
58, 51, 2), diagnosis_personality = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_psychosis = c(1, 1, 
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1), diagnosis_mania = c(0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), diagnosis_substance = c(0, 
0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 
0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1), intoxication = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1)), row.names = c(NA, 
-123L), class = c("tbl_df", "tbl", "data.frame"))

I could find information on doing this with a linear and a logistic model, but not on a zero-inflated negbin model. Maybe that's why tidy is not working? Any help is much appreciated.


Solution

  • We can fit mixed/multilevel models to data with singleton groups; the main constraint is whether there is enough information overall to make estimating the among-group variances feasible.

    Among mainstream packages, lme4 can fit LMMs, logistic GLMMs, and negative binomial mixed models (albeit a bit slowly, and not zero-inflated mixed models without a lot of work: see e.g. here). glmmTMB can handle all of the above. brms can do anything glmmTMB can do, plus the kitchen sink, but is slower (because Bayesian MCMC) and may need you to get into the weeds of Bayesian/MCMC sampling.

    The examples below use your data subset, with considerably simplified models. Your sample data has 86 patients, 61 singletons, 123 total observations. This seems to be nearly enough to fit the simplified models (below); we do run into some of the typical not-quite-enough-data problems with the ZINB fit (zero-inflation term converges to zero probability, NB dispersion parameter converges to a Poisson distribution) and the logistic fit (singular fit, i.e. the among-patient variance converges to zero). These problems are much less likely to occur if your full data set is large ...

    The first bit of machinery (besides loading packages) is cosmetic, to avoid too much repetitive code and make it easier to see what common set of predictors is included across all models.

    library(lme4)
    library(glmmTMB)
    ## list of all of the fixed effect predictors
    fix_vars <- c("after_wm", "age", "gender_male", "ref_mode_police")
    
    ## 'resp' is the name of the response variable (character)
    ff <- function(resp) {
       reformulate(c(fix_vars, "(1|patient)"), resp = resp)
    }
    
    ## Gaussian/LMM
    lme4::lmer(ff("total_cm_duration"), data = dd)
    
    ## NB/no zero-inflation
    glmmTMB::glmmTMB(ff("cm_number"),
                     family = nbinom2,
                     data = dd)
    
    ## NB with (simple) zero-inflation
    glmmTMB::glmmTMB(ff("cm_number"),
                     family = nbinom2,
                     ## could use zi = ff(NULL) to include all FE predictors
                     ##  as ZI predictors as well ...
                     zi = ~1,
                     data = dd)
    
    ## logistic
    lme4::glmer(ff("cm"),
                family = binomial,
                data = dd)