Search code examples
rfunctiondataframefeature-extraction

Function that works with values in a seq()


I have a df where I want to count per id the amount of times where column c < value.

structure(list(id = c(14, 14, 15, 15, 15, 26, 26, 26, 26), a = c(1, 
2, NA, 7, NA, 2, NA, 2, 3), b = c(2, 4, 8, NA, 1, 4, 2, 9, 8), 
    c = c(2.3, 4.4, 1.3, 5.4, 3, NA, 1, 0, 3)), class = "data.frame", row.names = c(NA, 
-9L))

  id  a  b   c
1 14  1  2 2.3
2 14  2  4 4.4
3 15 NA  8 1.3
4 15  7 NA 5.4
5 15 NA  1 3.0
6 26  2  4  NA
7 26 NA  2 1.0
8 26  2  9 0.0
9 26  3  8 3.0

I want to make a df with the results of all the thresholds that I took. Which would result in:

thres_range <- seq(1, 3) # values I want to try as threshold

     id thres1 thres2 thres3
1    14      0      0      1
2    15      0      1      1
3    26      1      2      2

I already got the code that counts per id the amount of times where column c < value. However, I can't manage to write a function that applies that code for all the values in a seq() and puts the results in 1 dataframe

library(dplyr)
thres_range <- seq(1, 3) # values I want to try as threshold

fun <- function(thres) {
  w <- paste0("thres", thres) # give column name e.g. thres2, thres3 etc
  df %>% group_by(id) %>% 
    summarise(w = sum(c < thres, na.rm=TRUE))
}

sapply(thres_range, function(L) fun(L))

Any suggestions would be appreciated! Thanks in advance! :D


Solution

  • We loop over the 'thresh_range' with map, grouped by 'id', summarise to return the sum of logical expression in 'w', do a inner_join in reduce by 'id', and rename the columns if neccessary

    library(dplyr)
    library(purrr)
    map(thres_range, ~ 
          df %>%
            group_by(id) %>%
            summarise(w = sum(c < .x, na.rm = TRUE))) %>% 
        reduce(inner_join, by = 'id') %>%
        rename_at(vars(starts_with('w')), ~ str_c('thresh', seq_along(.)))
    # A tibble: 3 x 4
    #     id thresh1 thresh2 thresh3
    #  <dbl>   <int>   <int>   <int>
    #1    14       0       0       1
    #2    15       0       1       1
    #3    26       1       2       2
    

    If we are creating a function, the assignment 'w' on the lhs with = will result in evaluating 'w' literaly instead of the value inside. We need

      summarise(!! w := sum(c < thresh, na.rm = TRUE))
    

    i.e.

    f1 <- function(dat, thresh) {
             w <- str_c('thresh', thresh)
             dat %>%
                 group_by(id) %>%
                 summarise(!! w := sum(c < thresh, na.rm = TRUE))
         }
    
    map(thres_range,  f1, dat = df) %>%
       reduce(inner_join, by = 'id')