Search code examples
rdplyrgroup-bytidyversesubset

Conditional subsetting; applying the same function to unique groups


In R, I am looking to conditionally subset my dataset so I can apply the same function to different groups of data. Here are dummy data:

data <- data.frame(id = seq(1, 100, by = 1),
                 sex = sample(c('M', 'F'), 100, replace = TRUE),
                 age_class = sample(c('A', 'S'), 100, replace = TRUE), # A = adult, S = subadult
                 season = sample(c('spring', 'autumn'), 100, replace = TRUE),
                 den_status = sample(c(0,1), 100, replace = TRUE), # 1 = yes, 0 = no. Only females can den and get a 1 or 0, males are all dummy coded as 0
                 weight = sample(80:600, 100, replace = TRUE),
                 offspring = sample(c('Y','N'), 100, replace = TRUE),
                 albumin = rnorm(100, 5, 2),
                 cortisol = rnorm(100, 30, 12),
                 calcium = rnorm(100, 0.3, 0.005),
                 globulin = rnorm(100, 1.9, 0.3),
                 insulin = rnorm(100, 3, 0.13))

Preliminarily, I grouped the data by sex, age_class, and den_status and applied a custom function called mod.zscore per grouping. The function acts on columns albumin:insulin and then I created new columns that contain the output data.

data.new <- data %>%
  group_by(sex, age_class, den_status) %>%
  mutate(across(c(albumin:insulin), mod.zscore,
            .names = "{.col}_{'zscore'}")) %>% ungroup()

This works fine and does what I need it to do. Where I'm stuck is that I need to conditionally subset or group the data so that I only group by den_status when sex == 'female', age_class == 'A', and season == 'spring'. Currently, my code groups both males and females by den_status, which is not necessarily a problem because all males have den_status = 0 anyway. The problem arises in that I only want den_status to apply to spring females.

Basically, I want these groupings:

  1. Sex = F, age class = A, season = spring, den_status = 0
  2. Sex = F, age class = A, season = spring, den_status = 1
  3. Sex = F, age class = S, season = spring
  4. Sex = F, age class = A, season = autumn
  5. Sex = F, age class = S, season = autumn
  6. Sex = M, age class = A, season = spring
  7. Sex = M, age class = S, season = spring
  8. Sex = M, age class = A, season = autumn
  9. Sex = M, age class = S, season = autumn

Any help is greatly appreciated. Thank you!

EDIT: I think I'm looking for a solution that will not create new columns because I will need to again work in terms of columns sex, age_class, den_status, and season.


Solution

  • library(dplyr)
    data.new <- data %>%
      mutate(season2=case_when(sex == "F" & age_class == "A" & season == "spring" & den_status == 0~ "spring0",
                               sex == "F" & age_class == "A" & season == "spring" & den_status == 1~ "spring1",
                               TRUE~ season)) %>%
      group_by(sex,age_class,season2)%>%
      mutate(n=n())