Search code examples

Mutating to return the average of a column in each row

I'm trying to calculate the mean of each column in my dataframe and return the mean value to each row in the dataframe, across multiple columns of a similar name. My thought was to use the mutate(across(starts_with())) functions to call up the columns I want to manipulate, then use ~summarize(mean()) to calculate the mean of each column and mutate the original values of each column. However, I get an error that says that summarize() can't be used with my class of data in the Fruits - Apples column. When I checked that column with str(), it confirmed that the values were of a character class, so I converted everything with as.numeric(). I still get the same error when I run my code.

# Sample Data

test<-structure(list(`Fruits - Apples` = c("1", "4"), `Fruits - Oranges` = c("2", 
"6"), `Fruits - Bananas` = c("5", "3")), row.names = c(NA, -2L
), class = c("tbl_df", "tbl", "data.frame"))

> test
# A tibble: 2 × 3
  `Fruits - Apples` `Fruits - Oranges` `Fruits - Bananas`
  <chr>             <chr>              <chr>             
1 1                 2                  5                 
2 4                 6                  3                 

# Attempted Code

 mutate(across(everything(), ~as.numeric(.x)))%>%
 mutate(across(starts_with("Fruits -"), ~ summarize(mean = mean(.x, na.rm = T))))

# Error Code

Error in `mutate()`:
ℹ In argument: `across(starts_with("Fruits -"), ~summarize(mean = mean(.x, na.rm = T)))`.
Caused by error in `across()`:
! Can't compute column `Fruits - Apples`.
Caused by error in `UseMethod()`:
! no applicable method for 'summarise' applied to an object of class "c('double', 'numeric')"
Run `rlang::last_trace()` to see where the error occurred.

# Desired Output

  `Fruits - Apples` `Fruits - Oranges` `Fruits - Bananas`      
   2.5               4                  4                 
   2.5               4                  4                 


  • Don't use summarize inside mutate.

    If you want the same number of rows as the input, you use mutate:

    test %>%
      mutate(across(everything(), as.numeric)) %>%
      mutate(across(starts_with("Fruits -"), ~mean(.x, na.rm = TRUE)))
    # # A tibble: 2 × 3
    #   `Fruits - Apples` `Fruits - Oranges` `Fruits - Bananas`
    #               <dbl>              <dbl>              <dbl>
    # 1               2.5                  4                  4
    # 2               2.5                  4                  4

    If you want one row per group (1 row in this case as you haven't set any groups), use summarize:

    test %>%
     mutate(across(everything(), as.numeric)) %>%
     summarize(across(starts_with("Fruits -"), ~mean(.x, na.rm = TRUE)))
    # # A tibble: 1 × 3
    #   `Fruits - Apples` `Fruits - Oranges` `Fruits - Bananas`
    #               <dbl>              <dbl>              <dbl>
    # 1               2.5                  4                  4

    Also note that if you are applying a function with no extra arguments, like as.numeric above, then you don't need the ~foo(.x), you can just say foo.