Loop through a number of variables and insert each into a function R

I am working in R.

I have some data about staff in a school:

data <- data.frame(person_id = c(1, 2, 3, 4, 5, 6, 7, 8), 
                   disability_status = c("yes", "no", "yes", "no", "yes", "no", "yes", "no"),
                   age_group = c("20-30","30-40","20-30","30-40","20-30","30-40","20-30","30-40"), 
                   teacher = c("yes", "no", "no", "yes", "no","yes", "no", "yes" ))

I have written a function that creates sums across what variable you insert into it. The "group_tag" argument is to help with debugging at a later date in my code.

group_the_data <- function(data, 
                           variable, 
                           group_tag) {
  
  grouped_output <- data %>%
                    mutate(flag = 1) %>%
                    group_by({{variable}}) %>%
                    summarise(number_staff = sum(flag, na.rm = T)) %>%
                    mutate(grouping_tag := {{group_tag}})
  
  return(grouped_output)
  
}

I then use the function to group by disability_status, age_group and teacher in turn:

disability_grouped <- group_the_data(data = data,
                                     variable = disability_status,
                                     group_tag = "disability status")

age_group_grouped <- group_the_data(data = data,
                                    variable = age_group,
                                    group_tag = "age group")

role_grouped <- group_the_data(data = data,
                               variable = teacher,
                               group_tag = "role")

Once I have the dataframes I need, I bind them together:

all_data_grouped <- bind_rows(disability_grouped, age_group_grouped, role_grouped)

Is there a way to loop through the variables so I don't need to write out the function three times?

Or is using one of the Apply functions a better idea?

Solution

You can use lapply or purrr::map to iterate through your variables. To do that, we need to loop through strings but not variables, so you'll need to pick the variable in group_by.

library(tidyverse)

group_the_data <- function(data, 
                           variable, 
                           group_tag) {
  
  grouped_output <- data %>%
    mutate(flag = 1) %>%
    group_by(pick(variable)) %>% # pick the variable
    summarise(number_staff = sum(flag, na.rm = T)) %>%
    mutate(grouping_tag := {{group_tag}})
  
  return(grouped_output)
  
}

purrr::map(colnames(data)[-1], 
           ~ group_the_data(data, variable = .x, group_tag = .x)) %>% 
  bind_rows()

# A tibble: 6 × 5
  disability_status number_staff grouping_tag      age_group teacher
  <chr>                    <dbl> <chr>             <chr>     <chr>  
1 no                           4 disability_status NA        NA     
2 yes                          4 disability_status NA        NA     
3 NA                           4 age_group         20-30     NA     
4 NA                           4 age_group         30-40     NA     
5 NA                           4 teacher           NA        no     
6 NA                           4 teacher           NA        yes

Similarly, use purrr::map2 if you want to have different "variable" and "group_tag":

purrr::map2(colnames(data)[-1], 
            c("disability status", "age group", "role"), 
            ~ group_the_data(data, variable = .x, group_tag = .y)) %>% 
  bind_rows()

# A tibble: 6 × 5
  disability_status number_staff grouping_tag      age_group teacher
  <chr>                    <dbl> <chr>             <chr>     <chr>  
1 no                           4 disability status NA        NA     
2 yes                          4 disability status NA        NA     
3 NA                           4 age group         20-30     NA     
4 NA                           4 age group         30-40     NA     
5 NA                           4 role              NA        no     
6 NA                           4 role              NA        yes