Search code examples
rdplyrgroup-bysummarizegroup-summaries

How to combine summarize_at and custom function that requires input from multiple columns in R?


I have a list of employees actual capacity (which changes each month) and their scheduled capacity (which is constant every month). I want to use summarize_at to tell what percentage they are over (or under) their allocation. However, I can't figure out how to pipe my custom function through my summarize call. I tried looking at this answer, but my function is different in that it requires input from multiple columns.

Here's a sample dataset:

library(dplyr)
question <- tibble(name = c("justin", "justin", "corey", "corey"),
                   allocation_1 = c(1, 2, 4, 8),
                   allocation_2 = c(2, 4, 11, 9),
                   scheduled_allocation = c(3, 3, 4, 4))

Here's what I want:

library(dplyr)
answer <- tibble(name = c("justin", "corey"),
                 allocation_1 = c(100, 300),
                 allocation_2 = c(200, 500))

And here's what I've got so far. I know the custom function works--I just can't get it to pipe through. X would correspond to their summed allocation (e.g., for justin for allocation1, 1+2 = 3), and Y is their scheduled allocation (e.g., 3--not 6). Thus, 3/3 = 1 *100 = 100 percent allocated.

#custom function that works

get_cap_percent <- function (x, y) {
  100*(x/y)
}

#Code that doesn't work
question %>%
  dplyr::group_by(name) %>%
  summarise_at(vars(contains("allocation_")), sum, na.rm = TRUE) %>%
  summarise_at(vars(contains("allocation_")), get_cap_percent, x = ., y = scheduled_allocation)


Solution

  • We can wrap it in a single summarise as after the summarise step there won't be any other columns except those columns and the grouping

    library(dplyr)
    question %>% 
        group_by(name) %>%
        summarise(across(contains('allocation_'), ~
         get_cap_percent(sum(., na.rm = TRUE), first(scheduled_allocation))))
    

    -output

    # A tibble: 2 x 3
      name   allocation_1 allocation_2
      <chr>         <dbl>        <dbl>
    1 corey           300          500
    2 justin          100          200