I have a list of dataframes. I want to subset only the dataframes which contain a row with a score 10 fold lower than the second ranking score, removing all other dataframes. Any idea how to approach this? Thanks!
>Output
$E1
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
$E2
ID model score
E2 XXX 130
E2 ASD 144
E2 DFE 142
E2 FGS 145
E2 GFH 124
Preferred result:
>Output_subset
$E1
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
You can write a function to check for the condition between the two scores :
check_data <- function(df) {
x <- sort(df$score)
x[1] < (x[2]/10)
}
You can use this function in Filter
in base R :
Filter(check_data, Output)
#$E1
# ID model score
#1 E1 AAA 2
#2 E1 BBB 100
#3 E1 CCC 130
#4 E1 ZZZ 120
#5 E1 YYY 128
Or keep
in purrr
:
purrr::keep(Output, check_data)
data
Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"),
model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L,
100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA,
-5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L,
144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))