Search code examples
rlistvectorcomparison

Group and compare numbers in a list


Consider the following data frame obtained after a cbind operation on two lists

> fl
  x meanlist
1 1     48.5
2 2     32.5
3 3     28.0
4 4     27.0
5 5     25.5
6 6     20.5
7 7     27.0
8 8     24.0

class_median <- list(0, 15, 25, 35, 45)
class_list <- list(0:10, 10:20, 20:30, 30:40, 40:50)

The values in class_median represent classes -10 to +10, 10 to 20, 20 to 30 etc

Firstly, I am trying to group the values in fl$meanlist as per the classes in class_list. Secondly, I am trying to return one value per class which is closest to the median values as follows

> fl_subset
  x meanlist cm
1 1     48.5 45
2 2     32.5 35
3 5     25.5 25

I am trying to use loops to compare but it seems to be long and unmanageable and the result is not correct


Solution

  • Here's an approach with dplyr:

    library(dplyr)
    
    # do a little prep--name classes, extract breaks, put medians in a data frame
    names(class_list) = letters[seq_along(class_list)]
    breaks = c(min(class_list[[1]]), sapply(class_list, max))
    med_data = data.frame(median = unlist(class_median), class = names(class_list))
    
    
    fl %>% 
      # assign classes
      mutate(class = cut(meanlist, breaks = breaks, labels = names(class_list))) %>%
      # get medians
      left_join(med_data) %>%
      # within each class...
      group_by(class) %>%
      # keep the row with the smallest absolute difference to the median
      slice(which.min(abs(meanlist - median))) %>%
      # sort in original order
      arrange(x)
    
    # Joining, by = "class"
    # # A tibble: 3 x 4
    # # Groups:   class [3]
    #       x meanlist class median
    #   <int>    <dbl> <fct>  <dbl>
    # 1     1     48.5 e         45
    # 2     2     32.5 d         35
    # 3     5     25.5 c         25