Search code examples
rdplyracross

Timing of evaluation of several across inside same summarize in dplyr pipe


I'm wondering the way R is evaluating several across in the same summarise inside a dplyr piping. Consider the following example:

data(iris)

iris_summary <- iris %>%
  group_by(Species) %>%
  summarise(
    across(
      .cols = starts_with("Sepal"), 
      .fns = mean
    ),
    across(
      .cols = starts_with("Petal"),
      .fns = ~ .x[which.max(Sepal.Length)]
    )
  )

The outcome produce is not the same as following code:

iris_summary_2 <- iris %>%
  group_by(Species) %>%
  summarise(
    across(
      .cols = starts_with("Petal"),
      .fns = ~ .x[which.max(Sepal.Length)]
    ),
    across(
      .cols = starts_with("Sepal"), 
      .fns = mean
    )
  )

Is it a problem need to the timing R is evaluating two across in the same summarise? See image below:

timing R is evaluating two across in the same summarise

I expected R was re-starting from step 0 before evaluating both step 1 and step 2, but the results seems indicate that, in step 2, R is taking the vector Sepal.Length from step 1 and not from step 0 (previous piping step). Anyone has tips to force R to take the vector from step 0 without changing code structure?


Solution

  • Yes, summarize, like mutate and tibble, works sequentially and will use the most recently-updated version of any variables.

    mtcars |>
      summarize(gear = mean(gear),
                gear2 = mean(gear) * 100)
    
        gear  gear2
    1 3.6875 368.75
    

    You might consider using the .names argument to put your summary numbers in new variables that don't alter the original ones.

    iris %>%
      group_by(Species) %>%
      summarise(
        across(
          .cols = starts_with("Sepal"), 
          .fns = mean,
          .names = "{.col}_mean"
        ),
        across(
          .cols = starts_with("Petal"),
          .fns = ~ .x[which.max(Sepal.Length)],
          .names = "{.col}_max_Sepal"
        )
      )
    
    # A tibble: 3 × 5
      Species    Sepal.Length_mean Sepal.Width_mean Petal.Length_max_Sepal Petal.Width_max_Sepal
      <fct>                  <dbl>            <dbl>                  <dbl>                 <dbl>
    1 setosa                  5.01             3.43                    1.2                   0.2
    2 versicolor              5.94             2.77                    4.7                   1.4
    3 virginica               6.59             2.97                    6.4                   2