I have a list of dataframes. I want to subset only the dataframes which contain a row with a score 10 fold lower than the second ranking score, removing all other dataframes. Any idea how to approach this? Thanks!
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
ID model score
E2 XXX 130
E2 ASD 144
E2 DFE 142
E2 FGS 145
E2 GFH 124
Preferred result:
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
You can write a function to check for the condition between the two scores :
check_data <- function(df) {
x <- sort(df$score)
x[1] < (x[2]/10)
You can use this function in Filter
in base R :
Filter(check_data, Output)
# ID model score
#1 E1 AAA 2
#2 E1 BBB 100
#3 E1 CCC 130
#4 E1 ZZZ 120
#5 E1 YYY 128
Or keep
in purrr
purrr::keep(Output, check_data)
Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"),
model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L,
100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA,
-5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L,
144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))