r: nested for-loop only generates NA when summarizing output

I am new to for loops.

I am using survival data, let's say:

library(tidyverse)
library(riskRegression)
data(Melanoma)

My nested for loop should:

re-allocate a random sample in a total i-times
the random sample comprise j-percentage of the entire Melanoma-data

Explanation:

A Cox regression may be written like:

coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma)

Consider

table(Melanoma$ici)

  0   1   2   3 
 17  59 107  22

I want to investigate how the Cox regression is affected if we allocate different percentages (eg. 5% and 10% - the j) of Melanoma$ici == 3 to Melanoma$ici == 4 using a random sampling i'th times (i).

Basically I want to loop this code (note j) with a new set.seed() in i-times

Melanoma_new$new_ici[sample(which(Melanoma_new$new_ici == 3), round(j * length(which(Melanoma_new$new_ici == 3))))] = 4

Create vectors for the loop

nr <- c()
auc_baseline <- c()
auc_new <- c()
delta_auc <- c()
auc_p_contrast <- c()
all_brier <- c()

Here's what I have, but that does not work:

for(j in seq(0.05, 0.1, 0.05)){ ### the proportion that should be chosen for random sampling
  
  for(i in 1:3){
    Melanoma_new <- Melanoma %>%
      mutate(ici = as.numeric(ici),
             new_ici = as.numeric(ici)) 
    
    set.seed(i) ### set new random samlping  
    
    Melanoma_new$ici[sample(which(Melanoma_new$ici == 3), round(j * length(which(Melanoma_new$ici == 3))))] = 4
    
    ith_cox <- coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma_new, x = TRUE)
    ith_cox_new <- coxph(Surv(time, status == 1) ~ new_ici + age, data = Melanoma_new, x = TRUE)
    
    u_i <- Score(list("baseline" = ith_cox,
                      "newWHO" = ith_cox_new),
                 Surv(time, status == 1) ~ 1,
                 data = Melanoma_new,
                 times = c(300, 600),
                 plots = "cal",
                 B = 100,
                 seed = 1,
                 split.method = "loob",
                 metrics = c("auc", "brier")
    )
    
    ### NOT SURE ON THIS PART
    nr[i] <- i          
    auc_baseline[i] <- u_i$AUC$score$AUC
    auc_new[i] <- u_i$AUC$score$AUC
    delta_auc[i] <- u_i$AUC$contrasts$delta.AUC
    auc_p_contrast[i] <- u_i$AUC$contrasts$p
    all_brier[i] <- u_i$Brier$score$Brier
  }
}

Finally I want to write a dataframe that looks like this:

data.frame(
  seed_i = nr, #write what i / seed_nr that was used to generate this estimate
  perc_j = #not sure what to write, but I want to know what j was used here in obtaining this estimate
  AUC_baseline = auc_baseline*100,
  AUC_new = auc_new*100,
  Best_AUC = ifelse(auc_baseline > auc_new, "Baseline", "New"),
  AUC_p = ifelse(auc_p_contrast < 0.05, sprintf("%.05f", round(auc_p_contrast, digits = 5)), "")
)

EDIT So I figured out that the NA probably was related to setting Score::times too low, I have changed it to times = c(300, 600). I will therefore direct this topic to the subtle question I asked in terms of storing the results. As you can see, I am currently storing my results in vectors that later use to generate a data.frame. I need to store the following from the for-loop: u_i$AUC$score$AUC, u_i$AUC$contrasts$delta.AUC, u_i$AUC$contrasts$p, and the i'th seed that generated the AUC in Score, and the corresponding j'th percentage.

Solution

How about functionalisation? Does this give what you want?

# Also need to call survival
library(tidyverse)
library(riskRegression)
library(survival)
data(Melanoma)

# Produce a function that outputs the list of parameters
  cox_reg_function <- function(j, i) {
    Melanoma_new <- Melanoma %>%
      mutate(ici = as.numeric(ici),
             new_ici = as.numeric(ici)) 
    
    set.seed(i) ### set new random samlping  
    
    Melanoma_new$ici[sample(which(Melanoma_new$ici == 3), round(j * length(which(Melanoma_new$ici == 3))))] = 4
    
    ith_cox <- coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma_new, x = TRUE)
    ith_cox_new <- coxph(Surv(time, status == 1) ~ new_ici + age, data = Melanoma_new, x = TRUE)
    
    u_i <- Score(list("baseline" = ith_cox,
                      "newWHO" = ith_cox_new),
                 Surv(time, status == 1) ~ 1,
                 data = Melanoma_new,
                 times = c(300, 600),
                 plots = "cal",
                 B = 100, 
                 seed = 1,
                 split.method = "loob",
                 metrics = c("auc", "brier")
    )
    
    out <- tibble(prop = j, # Keep track of the inputs across the iterations
                  seed = i, # ditto
                  auc_new = u_i$AUC$score$AUC, 
                  delta_auc = u_i$AUC$contrasts$delta.AUC, 
                  auc_p_contrast = u_i$AUC$contrasts$p, 
                  all_brier = u_i$Brier$score$Brier)
    
    return(out)
  }

# Build dataframe of probabilities and seeds
  df <- expand_grid(j = seq(0.05, 0.1, 0.05), i = 1:3)

# Run function against each combination
  out <- map2_dfr(df$j, df$i, cox_reg_function)


 > out
# A tibble: 36 × 6
    prop  seed auc_new delta_auc auc_p_contrast all_brier
   <dbl> <int>   <dbl>     <dbl>          <dbl>     <dbl>
 1  0.05     1   0.410   0.00105         0.986     0.0300
 2  0.05     1   0.394   0.00851         0.888     0.0482
 3  0.05     1   0.411   0.00745         0.0686    0.0302
 4  0.05     1   0.437   0.0425          0.488     0.0489
 5  0.05     1   0.419   0.0544          0.379     0.0302
 6  0.05     1   0.449   0.0119          0.0108    0.0488
 7  0.05     2   0.410   0.0109          0.857     0.0300
 8  0.05     2   0.394   0.00851         0.888     0.0482
 9  0.05     2   0.421  -0.00242         0.607     0.0301
10  0.05     2   0.436   0.0421          0.455     0.0487
# … with 26 more rows
# ℹ Use `print(n = ...)` to see more rows

NB I applied fewer cycles to speed up the process, so values may vary to the actual output