Search code examples
rdplyrmagrittr

dplyr: custum function in mutate. Uses full matrix instead of chunks?


Consider this example:

library(dplyr)
library(magrittr)

set.seed(123)
grp_s <- round(runif(4, 1, 10))
group <- rep(1:length(grp_s), grp_s)
dataF <- data.frame(grouping = group, var_a = runif(length(group)), var_b = runif(length(group)), var_c = runif(length(group)))

compute_it <- function(var_a, var_b){
    sum(var_a[var_b > .5], na.rm = TRUE)
}

dataF %<>%
        group_by(grouping) %>%
        mutate(fix_it = compute_it(var_a, var_b))

So far so good. Now instead of compute_it which takes as argument column names, I would like to use a function which takes as argument a chunk of the data (one chunk for each value of grouping).

something list using this function:

compute_it_2 <- function(Data){
    sum(Data$var_a[Data$var_b > .5], na.rm = TRUE)
}

where compute_it is used above. How to do that?


Solution

  • Also using tidyr and purrr we can either use do or nest first:

    library(tidyverse)
    
    dataF %>%
      group_by(grouping) %>%
      do(fix_it = compute_it_2(.)) %>% 
      unnest()
    

    Giving:

    # A tibble: 4 × 2
      grouping    fix_it
         <int>     <dbl>
    1        1 2.4065483
    2        2 0.9568333
    3        3 0.0000000
    4        4 1.8274955
    

    Or the nesting approach:

    dataF %>% 
      group_by(grouping) %>% 
      nest() %>% 
      mutate(fix_it = map_dbl(data, compute_it_2))
    
    # A tibble: 4 × 3
      grouping             data    fix_it
         <int>           <list>     <dbl>
    1        1 <tibble [4 × 3]> 2.4065483
    2        2 <tibble [8 × 3]> 0.9568333
    3        3 <tibble [5 × 3]> 0.0000000
    4        4 <tibble [9 × 3]> 1.8274955
    

    If you unnest() the second option you get the original frame back:

    # A tibble: 26 × 5
       grouping    fix_it     var_a      var_b      var_c
          <int>     <dbl>     <dbl>      <dbl>      <dbl>
    1         1 2.4065483 0.9404673 0.96302423 0.12753165
    2         1 2.4065483 0.0455565 0.90229905 0.75330786
    3         1 2.4065483 0.5281055 0.69070528 0.89504536
    4         1 2.4065483 0.8924190 0.79546742 0.37446278
    5         2 0.9568333 0.5514350 0.02461368 0.66511519
    6         2 0.9568333 0.4566147 0.47779597 0.09484066
    7         2 0.9568333 0.9568333 0.75845954 0.38396964
    8         2 0.9568333 0.4533342 0.21640794 0.27438364
    9         2 0.9568333 0.6775706 0.31818101 0.81464004
    10        2 0.9568333 0.5726334 0.23162579 0.44851634
    # ... with 16 more rows