Search code examples
rfor-loopdplyrrlangsummarize

Error while using dplyr::summarize with seq_along


An altruistic member here helped me write the following code to generate variables using a for loop and dplyr::summarize. This code, as expected, works fine.

library(nycflights13)

flights <- nycflights13::flights %>%
  select(carrier,distance,hour)

by_carrier <- NULL
for ( i in c("distance", "hour") {   
  df <- 
    flights %>%
    dplyr::group_by(carrier) %>%
    dplyr::summarize(!!as.name(i) := sum(!!as.name(i) ))
  by_carrier <- bind_cols(by_carrier,df)
}

But when I change the for loop argument in the following manner, it encounters an error:

var_interest <- c("distance", "hour")

by_carrier <- NULL

for ( i in seq_along(var_interest)) {   
  df <- 
    flights %>%
    dplyr::group_by(carrier) %>%
    dplyr::summarize(!!as.name(i) := sum(!!as.name(i) ))
  by_carrier <- bind_cols(by_carrier,df)
}

The error is as follows:

Error: Problem with `summarise()` input `1`.
x object '1' not found
i Input `1` is `sum(`1`)`.
i The error occurred in group 1: carrier = "9E".
Run `rlang::last_error()` to see where the error occurred.

What am I missing here? Thanks in advance.


Solution

  • Since you are using seq_along i is 1, 2 which are not the name of the columns in your data. Either change for loop to for (i in var_interest) or use var_interest[i] inside the loop.

    library(dplyr)
    
    by_carrier <- NULL
    
    var_interest <- c("distance", "hour")
    for (i in var_interest) {   
      df <- 
        flights %>%
        dplyr::group_by(carrier) %>%
        dplyr::summarize(!!as.name(i) := sum(!!as.name(i) ))
      by_carrier <- bind_cols(by_carrier,df)
    }
    

    Maybe a better option is to use across instead of a loop.

    flights %>%
      group_by(carrier) %>%
      summarise(across(all_of(var_interest), sum))