Search code examples
rdplyrtidyverse

mutate in R dplyr using column indexes as opposed to column names


I have the following dataframe in R. I have created a new column called 'Average' which takes the mean of the first three columns in my dataframe, but I would like to do it using column indexes (e.g column 2,3,4) as opposed to the column name. Is there a way to do that?

library(tidyverse)

data <- structure(list(Model = c("Adjusted Compnents Model", "ARIMA", 
"STIF Model"), `2021-11-30` = c(0.2, 0.1, 0.3), `2021-12-31` = c(0.2, 
0.3, 0), `2022-01-31` = c(0.2, 0.5, 0.3), `2022-02-28` = c(0.1, 
0.3, 0.1), `2022-03-31` = c(0.1, 0.2, 0.1), `2022-04-30` = c(0.2, 
0.1, 0.1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

data %>% 
  mutate(Average = mean(`2021-11-30`:`2022-01-31`), .after = 4)

Solution

  • We may need rowMeans

    library(dplyr)
    data <- data %>%
      mutate(Average = rowMeans(across(`2021-11-30`:`2022-01-31`), 
          na.rm = TRUE), .after = 4)
    

    -output

    data
    # A tibble: 3 × 8
      Model                    `2021-11-30` `2021-12-31` `2022-01-31` Average `2022-02-28` `2022-03-31` `2022-04-30`
      <chr>                           <dbl>        <dbl>        <dbl>   <dbl>        <dbl>        <dbl>        <dbl>
    1 Adjusted Compnents Model          0.2          0.2          0.2     0.2          0.1          0.1          0.2
    2 ARIMA                             0.1          0.3          0.5     0.3          0.3          0.2          0.1
    3 STIF Model                        0.3          0            0.3     0.2          0.1          0.1          0.1
    

    Regarding indexes, we can substitute the values in the range (if we understand it correctly)

    data %>%
      mutate(Average = rowMeans(across(2:4), 
          na.rm = TRUE), .after = 4)