Search code examples
rdplyrmutate

Is there a more compact way of writing chains of piped mutate calls?


In tidyverse, I often find that I am writing long chains of mutate calls, ... |> mutate(...) |> mutate(...) |> mutate(...) |> mutate(...) |> ... Is there a more compact way of writing this?

Example (please scroll down):

library(tidyverse)

REPEATS = 100
SAMPLE_SIZE = 617
N = REPEATS * SAMPLE_SIZE

BASELINE_SECURE_P = 0.6
LOG_BASELINE_SECURE_ODDS = log(BASELINE_SECURE_P / (1 - BASELINE_SECURE_P))
DAYCARE_LOG_OR_PER_HOUR = log(2.0)/3561 
WEEKS_PER_MONTH = 52 / 12

CARE_TYPES <- c("Mother", "Father", "Grandparent", "In-Home", "Child-Care Home", "Daycare")
CARE_TYPE_P <- c(.24, .15, .15, .15, .15, .36)
ACD <- c("A", "C", "D")
ACD_FREQ <- c(55, 197, 187) # Frequencies from 2001 Table 3 
ACD_P <- ACD_FREQ/sum(ACD_FREQ)

df <- data.frame(
  sample_no = rep(1:REPEATS, each=SAMPLE_SIZE),
  care_type = as.factor(sample(CARE_TYPES, N, prob = CARE_TYPE_P, replace = TRUE)),
  starting_age = runif(N, 0, 36)
) |> mutate(
  nonmaternal_hours_per_week = ifelse(care_type == "Mother", 0, pmax(0, rnorm(N, 30, 15))), 
) |> mutate(
  daycare_hours_per_week = ifelse(care_type == "Daycare", nonmaternal_hours_per_week, 0)
) |> mutate(
  nonmaternal_total_hours = nonmaternal_hours_per_week * WEEKS_PER_MONTH * (36 - starting_age),
  daycare_total_hours = daycare_hours_per_week * WEEKS_PER_MONTH * (36 - starting_age)
) |> mutate(
  secure_log_or = LOG_BASELINE_SECURE_ODDS - DAYCARE_LOG_OR_PER_HOUR * daycare_total_hours
) |> mutate(
  secure_p = exp(secure_log_or) / (1 + exp(secure_log_or)) 
) |> mutate(
  is_secure = rbinom(N, 1, secure_p),
  # Choose one of A, C, D attachment at random
  acd_random = sample(ACD, N, prob = ACD_P, replace = TRUE)
) |> mutate(
  ssp_abcd = as.factor(ifelse(is_secure, 'B', acd_random))
)

Solution

  • mutate (and other {dplyr} verbs) allows to issue a comma-separated sequence of manipulations inside a single call (see official examples), e.g.:

    some_dataframe |>
       mutate(var_1 = ...,
              var_2 = ...,
              ...
              )
    

    and even more conveniently allows to refer back to upstream manipulations right inside the expression:

    some_dataframe |>
        mutate(var_1 = ...,
               var_2 = var_1 * 42,
               ...
               )