Most questions related to traditional loops in R are explained away by using functionals with less code and, generally, being more flexible.
However, please correct me, I feel when the order of iterations are important, for loops would still dominate.
In my case, I would like to build a sequentially and cumulatively adjusted logistic regression model, store the OR/CIs along with a column showing the what are being adjusted for. This is my expected output:
Model OR CI
Biomarker
+Age
+Sex
+Smoking
Here's what I did:
df1 <- subset(df, select = c(age_cat, is_female, smoking_category,
bmi_calc, has_diabetes, sbp_mean,
alcohol_category, highest_education,
occupation, household_income))
model <- data.frame(NULL)
for (i in seq_along(df1)) {
model <- exp((cbind(OR = coef(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")),
family=binomial, data=df)),
confint(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")),
family=binomial, data=df)))))
}
My outcome variable is stroke (istroke, 0 or 1). My exposure of interest is the biomarker (hscrp_mgl). I know I'm making a fundamental mistake somewhere. I looked for in other SO posts but most of them don't want sequentially and cumulatively adjusted regression models.
Please let me know if this is a duplicate, nonetheless and if anything's unclear.
EDIT
My original dataset df contains all the variables of df1, my outcome variable and then some. Here is a reproducible sample of it:
age_cat is_female smoking_category bmi_calc has_diabetes sbp_mean istroke
(59,69] 0 4 19.6 0 103.5 0
(59,69] 1 1 19.1 0 138 0
(29,59] 0 4 26.8 0 155.5 0
(29,59] 0 1 23.1 0 130 1
(29,59] 1 1 22.7 0 126 1
(59,69] 0 4 25 0 182.5 0
(29,59] 1 1 20 0 96 1
(29,59] 1 2 23.9 0 134.5 0
(59,69] 0 4 24.4 0 160.5 1
EDIT A more reproducible example:
df <- data.frame(age = c(50, 60, 50, 40, 70, 90, 30),
gender = c(0, 1, 1, 0, 1, 1, 1),
smoke = c(4, 3, 2, 1, 4, 3, 4),
BMI = c(19, 20, 21, 22, 23, 24, 25),
SBP = c(100, 120, 140, 110, 120, 130, 120),
diab = c(0, 1, 1, 1, 0, 1, 1),
stroke = c(0, 1, 0, 0, 1, 1, 1))
dput(df)
structure(list(age = c(50, 60, 50, 40, 70, 90, 30), gender = c(0,
1, 1, 0, 1, 1, 1), smoke = c(4, 3, 2, 1, 4, 3, 4), BMI = c(19,
20, 21, 22, 23, 24, 25), SBP = c(100, 120, 140, 110, 120, 130,
120), diab = c(0, 1, 1, 1, 0, 1, 1), stroke = c(0, 1, 0, 0, 1,
1, 1)), .Names = c("age", "gender", "smoke", "BMI", "SBP", "diab",
"stroke"), row.names = c(NA, -7L), class = "data.frame")
Actually, lapply
might be a better approach in your case over for
as it can return a collection of data.frames for final row bind instead of expanding model iteratively.
Below example randomizes hscrp_mgl as it was not in posted data. So ignore results but consider process. Additionally, confidence interval is split between low and high in different columns.
set.seed(456)
df <- data.frame(hscrp_mgl = abs(rnorm(250)),
age = sample(100, 1000, replace=TRUE),
gender = sample(0:1, 1000, replace=TRUE),
smoke = sample(1:4, 1000, replace=TRUE),
BMI = sample(19:25, 1000, replace=TRUE),
SBP = sample(c(100, 120, 140, 110, 120, 130, 120),
1000, replace=TRUE),
diab = sample(0:1, 1000, replace=TRUE),
stroke = sample(0:1, 1000, replace=TRUE))
# ITERATE THROUGH COLUMN NUMBERS (SUBSETTING OUT FIRST AND LAST)
modeldfs <- lapply(seq_along(df)[3:ncol(df)-1], function(i) {
strf <- paste("stroke ~ log2(hscrp_mgl)",
paste(names(df)[2:i], collapse = "+"), sep = "+")
print(strf)
# FIT DYNAMIC CUMULATIVE FORMULA USING names() TO PASS IN COLUMN NAME
fit <- glm(as.formula(strf), family=binomial, data=df)
# BIND MODEL STATS
data.frame(OR = exp(coef(fit)[i+1]),
CI_2.5 = exp(confint(fit)[i+1,1]),
CI_97.5 = exp(confint(fit)[i+1,2]))
})
model <- do.call(rbind, modeldfs)
model
Output
[1] "stroke ~ log2(hscrp_mgl)+age"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP+diab"
# Waiting for profiling to be done...
# Waiting for profiling to be done...
# > model <- do.call(rbind, modeldfs)
# > model
OR CI_2.5 CI_97.5
age 1.003285 0.9989043 1.007701
gender 1.067117 0.8318796 1.369055
smoke 1.005926 0.9005196 1.123717
BMI 1.011281 0.9505659 1.075928
SBP 1.003252 0.9929368 1.013692
diab 1.139586 0.8880643 1.462925