Consider the following data frame obtained after a cbind
operation on two lists
> fl
x meanlist
1 1 48.5
2 2 32.5
3 3 28.0
4 4 27.0
5 5 25.5
6 6 20.5
7 7 27.0
8 8 24.0
class_median <- list(0, 15, 25, 35, 45)
class_list <- list(0:10, 10:20, 20:30, 30:40, 40:50)
The values in class_median
represent classes -10 to +10, 10 to 20, 20 to 30 etc
Firstly, I am trying to group the values in fl$meanlist
as per the classes in class_list
. Secondly, I am trying to return one value per class which is closest to the median values as follows
> fl_subset
x meanlist cm
1 1 48.5 45
2 2 32.5 35
3 5 25.5 25
I am trying to use loops to compare but it seems to be long and unmanageable and the result is not correct
Here's an approach with dplyr
:
library(dplyr)
# do a little prep--name classes, extract breaks, put medians in a data frame
names(class_list) = letters[seq_along(class_list)]
breaks = c(min(class_list[[1]]), sapply(class_list, max))
med_data = data.frame(median = unlist(class_median), class = names(class_list))
fl %>%
# assign classes
mutate(class = cut(meanlist, breaks = breaks, labels = names(class_list))) %>%
# get medians
left_join(med_data) %>%
# within each class...
group_by(class) %>%
# keep the row with the smallest absolute difference to the median
slice(which.min(abs(meanlist - median))) %>%
# sort in original order
arrange(x)
# Joining, by = "class"
# # A tibble: 3 x 4
# # Groups: class [3]
# x meanlist class median
# <int> <dbl> <fct> <dbl>
# 1 1 48.5 e 45
# 2 2 32.5 d 35
# 3 5 25.5 c 25