Search code examples
rdataframeloopsfor-loopnested-loops

r: nested for-loop only generates NA when summarizing output


I am new to for loops.

I am using survival data, let's say:

library(tidyverse)
library(riskRegression)
data(Melanoma)

My nested for loop should:

  • re-allocate a random sample in a total i-times
  • the random sample comprise j-percentage of the entire Melanoma-data

Explanation:

A Cox regression may be written like:

coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma)

Consider

table(Melanoma$ici)

  0   1   2   3 
 17  59 107  22

I want to investigate how the Cox regression is affected if we allocate different percentages (eg. 5% and 10% - the j) of Melanoma$ici == 3 to Melanoma$ici == 4 using a random sampling i'th times (i).

Basically I want to loop this code (note j) with a new set.seed() in i-times

Melanoma_new$new_ici[sample(which(Melanoma_new$new_ici == 3), round(j * length(which(Melanoma_new$new_ici == 3))))] = 4

Create vectors for the loop

nr <- c()
auc_baseline <- c()
auc_new <- c()
delta_auc <- c()
auc_p_contrast <- c()
all_brier <- c()

Here's what I have, but that does not work:

for(j in seq(0.05, 0.1, 0.05)){ ### the proportion that should be chosen for random sampling
  
  for(i in 1:3){
    Melanoma_new <- Melanoma %>%
      mutate(ici = as.numeric(ici),
             new_ici = as.numeric(ici)) 
    
    set.seed(i) ### set new random samlping  
    
    Melanoma_new$ici[sample(which(Melanoma_new$ici == 3), round(j * length(which(Melanoma_new$ici == 3))))] = 4
    
    ith_cox <- coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma_new, x = TRUE)
    ith_cox_new <- coxph(Surv(time, status == 1) ~ new_ici + age, data = Melanoma_new, x = TRUE)
    
    u_i <- Score(list("baseline" = ith_cox,
                      "newWHO" = ith_cox_new),
                 Surv(time, status == 1) ~ 1,
                 data = Melanoma_new,
                 times = c(300, 600),
                 plots = "cal",
                 B = 100,
                 seed = 1,
                 split.method = "loob",
                 metrics = c("auc", "brier")
    )
    
    ### NOT SURE ON THIS PART
    nr[i] <- i          
    auc_baseline[i] <- u_i$AUC$score$AUC
    auc_new[i] <- u_i$AUC$score$AUC
    delta_auc[i] <- u_i$AUC$contrasts$delta.AUC
    auc_p_contrast[i] <- u_i$AUC$contrasts$p
    all_brier[i] <- u_i$Brier$score$Brier
  }
}

Finally I want to write a dataframe that looks like this:

data.frame(
  seed_i = nr, #write what i / seed_nr that was used to generate this estimate
  perc_j = #not sure what to write, but I want to know what j was used here in obtaining this estimate
  AUC_baseline = auc_baseline*100,
  AUC_new = auc_new*100,
  Best_AUC = ifelse(auc_baseline > auc_new, "Baseline", "New"),
  AUC_p = ifelse(auc_p_contrast < 0.05, sprintf("%.05f", round(auc_p_contrast, digits = 5)), "")
) 

EDIT So I figured out that the NA probably was related to setting Score::times too low, I have changed it to times = c(300, 600). I will therefore direct this topic to the subtle question I asked in terms of storing the results. As you can see, I am currently storing my results in vectors that later use to generate a data.frame. I need to store the following from the for-loop: u_i$AUC$score$AUC, u_i$AUC$contrasts$delta.AUC, u_i$AUC$contrasts$p, and the i'th seed that generated the AUC in Score, and the corresponding j'th percentage.


Solution

  • How about functionalisation? Does this give what you want?

    # Also need to call survival
    library(tidyverse)
    library(riskRegression)
    library(survival)
    data(Melanoma)
    
    # Produce a function that outputs the list of parameters
      cox_reg_function <- function(j, i) {
        Melanoma_new <- Melanoma %>%
          mutate(ici = as.numeric(ici),
                 new_ici = as.numeric(ici)) 
        
        set.seed(i) ### set new random samlping  
        
        Melanoma_new$ici[sample(which(Melanoma_new$ici == 3), round(j * length(which(Melanoma_new$ici == 3))))] = 4
        
        ith_cox <- coxph(Surv(time, status == 1) ~ ici + age, data = Melanoma_new, x = TRUE)
        ith_cox_new <- coxph(Surv(time, status == 1) ~ new_ici + age, data = Melanoma_new, x = TRUE)
        
        u_i <- Score(list("baseline" = ith_cox,
                          "newWHO" = ith_cox_new),
                     Surv(time, status == 1) ~ 1,
                     data = Melanoma_new,
                     times = c(300, 600),
                     plots = "cal",
                     B = 100, 
                     seed = 1,
                     split.method = "loob",
                     metrics = c("auc", "brier")
        )
        
        out <- tibble(prop = j, # Keep track of the inputs across the iterations
                      seed = i, # ditto
                      auc_new = u_i$AUC$score$AUC, 
                      delta_auc = u_i$AUC$contrasts$delta.AUC, 
                      auc_p_contrast = u_i$AUC$contrasts$p, 
                      all_brier = u_i$Brier$score$Brier)
        
        return(out)
      }
    
    # Build dataframe of probabilities and seeds
      df <- expand_grid(j = seq(0.05, 0.1, 0.05), i = 1:3)
    
    # Run function against each combination
      out <- map2_dfr(df$j, df$i, cox_reg_function)
    
    
     > out
    # A tibble: 36 × 6
        prop  seed auc_new delta_auc auc_p_contrast all_brier
       <dbl> <int>   <dbl>     <dbl>          <dbl>     <dbl>
     1  0.05     1   0.410   0.00105         0.986     0.0300
     2  0.05     1   0.394   0.00851         0.888     0.0482
     3  0.05     1   0.411   0.00745         0.0686    0.0302
     4  0.05     1   0.437   0.0425          0.488     0.0489
     5  0.05     1   0.419   0.0544          0.379     0.0302
     6  0.05     1   0.449   0.0119          0.0108    0.0488
     7  0.05     2   0.410   0.0109          0.857     0.0300
     8  0.05     2   0.394   0.00851         0.888     0.0482
     9  0.05     2   0.421  -0.00242         0.607     0.0301
    10  0.05     2   0.436   0.0421          0.455     0.0487
    # … with 26 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    NB I applied fewer cycles to speed up the process, so values may vary to the actual output