Search code examples
rloopsdataframekolmogorov-smirnov

Output loop Kolmogorov Smirnov Test (ks.test) in dataframe


I would like to incorporate the output of a ks.test recorded in a loop into a data frame or file, instead of printing the outputs of 1155 tests in the console... :-).

column_equality_stats = function(data, lab_stats1, lab_stats2, min_count=100) {
  for(i in 1:length(lab_stats1)) {
    lab_testcodes_1 = lab_stats1[i]
    lab_testcodes_2 = lab_stats2[i]
    equal_columns <- filter(data, lab_testcode==lab_testcodes_1 | lab_testcode==lab_testcodes_2)
    col1 <- equal_columns[equal_columns$lab_testcode==lab_testcodes_1, 'lab_result']
    col2 <- equal_columns[equal_columns$lab_testcode==lab_testcodes_2, 'lab_result']
    if(sum(!is.na(col1))>min_count && sum(!is.na(col2))>min_count){
      stats <- ks.test(col1, col2)
      print(stats)

    }
  }
}

I would like to have a data.frame with the following columns: the names of col1 and col2 (the equation values), the p-value and the D-value.

Utopian data frame

Thank you very much in advance!!


Solution

  • This is as far as I can go. Please edit your question and add a reproducible example.

    I need to know how your data is defined. I can't run your code! :-)

    Anyway, just create a dataframe for every run of the loop and bind them together.

    map_dfr of the purrr package does it for you.

    library(purrr)
    
    .column_equality_stats <- function(i, data, lab_stats1, lab_stats2, min_count = 100){
    
      lab_testcodes_1 <- lab_stats1[i]
      lab_testcodes_2 <- lab_stats2[i]
      equal_columns <- filter(data, lab_testcode==lab_testcodes_1 | lab_testcode==lab_testcodes_2)
      col1 <- equal_columns[equal_columns$lab_testcode==lab_testcodes_1, 'lab_result']
      col2 <- equal_columns[equal_columns$lab_testcode==lab_testcodes_2, 'lab_result']
    
      if(sum(!is.na(col1))>min_count && sum(!is.na(col2))>min_count){
    
        stats <- ks.test(col1, col2)
        res <- data.frame(col1 = lab_testcodes_1,
                          col2 = lab_testcodes_2,,
                          pvalue = stats$p.value,
                          dvalue = stats$statistics)
    
      } else {res <- data.frame()}
    
      res
    
    }
    
    column_equality_stats <- function(data, lab_stats1, lab_stats2, min_count=100) {
    
      map_dfr(seq_along(lab_stats1), 
              .column_equality_stats,
              data       = data,
              lab_stats1 = lab_stats1,
              lab_stats2 = lab_stats2,
              min_count  = min_count)
    
    }