Search code examples
rlistdataframesubsetranking

Subsetting list of dataframes based on ranked column in dataframes R


I have a list of dataframes. I want to subset only the dataframes which contain a row with a score 10 fold lower than the second ranking score, removing all other dataframes. Any idea how to approach this? Thanks!

>Output
$E1
  ID    model   score
  E1      AAA    2
  E1      BBB    100
  E1      CCC    130
  E1      ZZZ    120
  E1      YYY    128

$E2
  ID    model   score
  E2      XXX    130
  E2      ASD    144
  E2      DFE    142
  E2      FGS    145
  E2      GFH    124

Preferred result:

>Output_subset
$E1
  ID    model   score
  E1      AAA    2
  E1      BBB    100
  E1      CCC    130
  E1      ZZZ    120
  E1      YYY    128

Solution

  • You can write a function to check for the condition between the two scores :

    check_data <- function(df) {
       x <- sort(df$score)
       x[1] < (x[2]/10)
    }
    

    You can use this function in Filter in base R :

    Filter(check_data, Output)
    
    #$E1
    #  ID model score
    #1 E1   AAA     2
    #2 E1   BBB   100
    #3 E1   CCC   130
    #4 E1   ZZZ   120
    #5 E1   YYY   128
    

    Or keep in purrr :

    purrr::keep(Output, check_data)
    

    data

    Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"), 
    model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L, 
    100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA, 
    -5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
    ), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L, 
    144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))