Search code examples
rfunctionloopsdplyrapply

Loop through a number of variables and insert each into a function R


I am working in R.

I have some data about staff in a school:

data <- data.frame(person_id = c(1, 2, 3, 4, 5, 6, 7, 8), 
                   disability_status = c("yes", "no", "yes", "no", "yes", "no", "yes", "no"),
                   age_group = c("20-30","30-40","20-30","30-40","20-30","30-40","20-30","30-40"), 
                   teacher = c("yes", "no", "no", "yes", "no","yes", "no", "yes" ))

I have written a function that creates sums across what variable you insert into it. The "group_tag" argument is to help with debugging at a later date in my code.

group_the_data <- function(data, 
                           variable, 
                           group_tag) {
  
  grouped_output <- data %>%
                    mutate(flag = 1) %>%
                    group_by({{variable}}) %>%
                    summarise(number_staff = sum(flag, na.rm = T)) %>%
                    mutate(grouping_tag := {{group_tag}})
  
  return(grouped_output)
  
}

I then use the function to group by disability_status, age_group and teacher in turn:

disability_grouped <- group_the_data(data = data,
                                     variable = disability_status,
                                     group_tag = "disability status")

age_group_grouped <- group_the_data(data = data,
                                    variable = age_group,
                                    group_tag = "age group")

role_grouped <- group_the_data(data = data,
                               variable = teacher,
                               group_tag = "role")

Once I have the dataframes I need, I bind them together:

all_data_grouped <- bind_rows(disability_grouped, age_group_grouped, role_grouped)

Is there a way to loop through the variables so I don't need to write out the function three times?

Or is using one of the Apply functions a better idea?


Solution

  • You can use lapply or purrr::map to iterate through your variables. To do that, we need to loop through strings but not variables, so you'll need to pick the variable in group_by.

    library(tidyverse)
    
    group_the_data <- function(data, 
                               variable, 
                               group_tag) {
      
      grouped_output <- data %>%
        mutate(flag = 1) %>%
        group_by(pick(variable)) %>% # pick the variable
        summarise(number_staff = sum(flag, na.rm = T)) %>%
        mutate(grouping_tag := {{group_tag}})
      
      return(grouped_output)
      
    }
    
    purrr::map(colnames(data)[-1], 
               ~ group_the_data(data, variable = .x, group_tag = .x)) %>% 
      bind_rows()
    
    # A tibble: 6 × 5
      disability_status number_staff grouping_tag      age_group teacher
      <chr>                    <dbl> <chr>             <chr>     <chr>  
    1 no                           4 disability_status NA        NA     
    2 yes                          4 disability_status NA        NA     
    3 NA                           4 age_group         20-30     NA     
    4 NA                           4 age_group         30-40     NA     
    5 NA                           4 teacher           NA        no     
    6 NA                           4 teacher           NA        yes 
    

    Similarly, use purrr::map2 if you want to have different "variable" and "group_tag":

    purrr::map2(colnames(data)[-1], 
                c("disability status", "age group", "role"), 
                ~ group_the_data(data, variable = .x, group_tag = .y)) %>% 
      bind_rows()
    
    # A tibble: 6 × 5
      disability_status number_staff grouping_tag      age_group teacher
      <chr>                    <dbl> <chr>             <chr>     <chr>  
    1 no                           4 disability status NA        NA     
    2 yes                          4 disability status NA        NA     
    3 NA                           4 age group         20-30     NA     
    4 NA                           4 age group         30-40     NA     
    5 NA                           4 role              NA        no     
    6 NA                           4 role              NA        yes