I am new to for loops.
I am using survival data, let's say:
library(tidyverse)
library(riskRegression)
data(Melanoma)
My nested for loop should:
i
-timesj
-percentage of the entire Melanoma
-dataExplanation:
A Cox regression may be written like:
coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma)
Consider
table(Melanoma$ici)
0 1 2 3
17 59 107 22
I want to investigate how the Cox regression is affected if we allocate different percentages (eg. 5% and 10% - the j
) of Melanoma$ici == 3
to Melanoma$ici == 4
using a random sampling i'th times (i
).
Basically I want to loop this code (note j
) with a new set.seed()
in i
-times
Melanoma_new$new_ici[sample(which(Melanoma_new$new_ici == 3), round(j * length(which(Melanoma_new$new_ici == 3))))] = 4
Create vectors for the loop
nr <- c()
auc_baseline <- c()
auc_new <- c()
delta_auc <- c()
auc_p_contrast <- c()
all_brier <- c()
Here's what I have, but that does not work:
for(j in seq(0.05, 0.1, 0.05)){ ### the proportion that should be chosen for random sampling
for(i in 1:3){
Melanoma_new <- Melanoma %>%
mutate(ici = as.numeric(ici),
new_ici = as.numeric(ici))
set.seed(i) ### set new random samlping
Melanoma_new$ici[sample(which(Melanoma_new$ici == 3), round(j * length(which(Melanoma_new$ici == 3))))] = 4
ith_cox <- coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma_new, x = TRUE)
ith_cox_new <- coxph(Surv(time, status == 1) ~ new_ici + age, data = Melanoma_new, x = TRUE)
u_i <- Score(list("baseline" = ith_cox,
"newWHO" = ith_cox_new),
Surv(time, status == 1) ~ 1,
data = Melanoma_new,
times = c(300, 600),
plots = "cal",
B = 100,
seed = 1,
split.method = "loob",
metrics = c("auc", "brier")
)
### NOT SURE ON THIS PART
nr[i] <- i
auc_baseline[i] <- u_i$AUC$score$AUC
auc_new[i] <- u_i$AUC$score$AUC
delta_auc[i] <- u_i$AUC$contrasts$delta.AUC
auc_p_contrast[i] <- u_i$AUC$contrasts$p
all_brier[i] <- u_i$Brier$score$Brier
}
}
Finally I want to write a dataframe that looks like this:
data.frame(
seed_i = nr, #write what i / seed_nr that was used to generate this estimate
perc_j = #not sure what to write, but I want to know what j was used here in obtaining this estimate
AUC_baseline = auc_baseline*100,
AUC_new = auc_new*100,
Best_AUC = ifelse(auc_baseline > auc_new, "Baseline", "New"),
AUC_p = ifelse(auc_p_contrast < 0.05, sprintf("%.05f", round(auc_p_contrast, digits = 5)), "")
)
EDIT
So I figured out that the NA
probably was related to setting Score::times too low, I have changed it to times = c(300, 600)
.
I will therefore direct this topic to the subtle question I asked in terms of storing the results. As you can see, I am currently storing my results in vectors that later use to generate a data.frame
. I need to store the following from the for-loop: u_i$AUC$score$AUC
, u_i$AUC$contrasts$delta.AUC
, u_i$AUC$contrasts$p
, and the i
'th seed that generated the AUC in Score, and the corresponding j
'th percentage.
How about functionalisation? Does this give what you want?
# Also need to call survival
library(tidyverse)
library(riskRegression)
library(survival)
data(Melanoma)
# Produce a function that outputs the list of parameters
cox_reg_function <- function(j, i) {
Melanoma_new <- Melanoma %>%
mutate(ici = as.numeric(ici),
new_ici = as.numeric(ici))
set.seed(i) ### set new random samlping
Melanoma_new$ici[sample(which(Melanoma_new$ici == 3), round(j * length(which(Melanoma_new$ici == 3))))] = 4
ith_cox <- coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma_new, x = TRUE)
ith_cox_new <- coxph(Surv(time, status == 1) ~ new_ici + age, data = Melanoma_new, x = TRUE)
u_i <- Score(list("baseline" = ith_cox,
"newWHO" = ith_cox_new),
Surv(time, status == 1) ~ 1,
data = Melanoma_new,
times = c(300, 600),
plots = "cal",
B = 100,
seed = 1,
split.method = "loob",
metrics = c("auc", "brier")
)
out <- tibble(prop = j, # Keep track of the inputs across the iterations
seed = i, # ditto
auc_new = u_i$AUC$score$AUC,
delta_auc = u_i$AUC$contrasts$delta.AUC,
auc_p_contrast = u_i$AUC$contrasts$p,
all_brier = u_i$Brier$score$Brier)
return(out)
}
# Build dataframe of probabilities and seeds
df <- expand_grid(j = seq(0.05, 0.1, 0.05), i = 1:3)
# Run function against each combination
out <- map2_dfr(df$j, df$i, cox_reg_function)
> out
# A tibble: 36 × 6
prop seed auc_new delta_auc auc_p_contrast all_brier
<dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 0.05 1 0.410 0.00105 0.986 0.0300
2 0.05 1 0.394 0.00851 0.888 0.0482
3 0.05 1 0.411 0.00745 0.0686 0.0302
4 0.05 1 0.437 0.0425 0.488 0.0489
5 0.05 1 0.419 0.0544 0.379 0.0302
6 0.05 1 0.449 0.0119 0.0108 0.0488
7 0.05 2 0.410 0.0109 0.857 0.0300
8 0.05 2 0.394 0.00851 0.888 0.0482
9 0.05 2 0.421 -0.00242 0.607 0.0301
10 0.05 2 0.436 0.0421 0.455 0.0487
# … with 26 more rows
# ℹ Use `print(n = ...)` to see more rows
NB I applied fewer cycles to speed up the process, so values may vary to the actual output