I am working in R.
I have some data about staff in a school:
data <- data.frame(person_id = c(1, 2, 3, 4, 5, 6, 7, 8),
disability_status = c("yes", "no", "yes", "no", "yes", "no", "yes", "no"),
age_group = c("20-30","30-40","20-30","30-40","20-30","30-40","20-30","30-40"),
teacher = c("yes", "no", "no", "yes", "no","yes", "no", "yes" ))
I have written a function that creates sums across what variable you insert into it. The "group_tag" argument is to help with debugging at a later date in my code.
group_the_data <- function(data,
variable,
group_tag) {
grouped_output <- data %>%
mutate(flag = 1) %>%
group_by({{variable}}) %>%
summarise(number_staff = sum(flag, na.rm = T)) %>%
mutate(grouping_tag := {{group_tag}})
return(grouped_output)
}
I then use the function to group by disability_status, age_group and teacher in turn:
disability_grouped <- group_the_data(data = data,
variable = disability_status,
group_tag = "disability status")
age_group_grouped <- group_the_data(data = data,
variable = age_group,
group_tag = "age group")
role_grouped <- group_the_data(data = data,
variable = teacher,
group_tag = "role")
Once I have the dataframes I need, I bind them together:
all_data_grouped <- bind_rows(disability_grouped, age_group_grouped, role_grouped)
Is there a way to loop through the variables so I don't need to write out the function three times?
Or is using one of the Apply functions a better idea?
You can use lapply
or purrr::map
to iterate through your variables. To do that, we need to loop through strings but not variables, so you'll need to pick
the variable in group_by
.
library(tidyverse)
group_the_data <- function(data,
variable,
group_tag) {
grouped_output <- data %>%
mutate(flag = 1) %>%
group_by(pick(variable)) %>% # pick the variable
summarise(number_staff = sum(flag, na.rm = T)) %>%
mutate(grouping_tag := {{group_tag}})
return(grouped_output)
}
purrr::map(colnames(data)[-1],
~ group_the_data(data, variable = .x, group_tag = .x)) %>%
bind_rows()
# A tibble: 6 × 5
disability_status number_staff grouping_tag age_group teacher
<chr> <dbl> <chr> <chr> <chr>
1 no 4 disability_status NA NA
2 yes 4 disability_status NA NA
3 NA 4 age_group 20-30 NA
4 NA 4 age_group 30-40 NA
5 NA 4 teacher NA no
6 NA 4 teacher NA yes
Similarly, use purrr::map2
if you want to have different "variable" and "group_tag":
purrr::map2(colnames(data)[-1],
c("disability status", "age group", "role"),
~ group_the_data(data, variable = .x, group_tag = .y)) %>%
bind_rows()
# A tibble: 6 × 5
disability_status number_staff grouping_tag age_group teacher
<chr> <dbl> <chr> <chr> <chr>
1 no 4 disability status NA NA
2 yes 4 disability status NA NA
3 NA 4 age group 20-30 NA
4 NA 4 age group 30-40 NA
5 NA 4 role NA no
6 NA 4 role NA yes