Search code examples
rmeancontainsdplyr

function will not work with dplyr's select wrappers (contains, ends_with)


I'm trying to calculate row means on a dataset. I found a helpful function someone made here (dplyr - using mutate() like rowmeans()), and it works when I try out every column but not when I try to use a dplyr helper function.

Why does this work:

#The rowmeans function that works

my_rowmeans = function(..., na.rm=TRUE){
  x = 
    if (na.rm) lapply(list(...), function(x) replace(x, is.na(x), as(0, class(x)))) 
  else       list(...)

  d = Reduce(function(x,y) x+!is.na(y), list(...), init=0)

  Reduce(`+`, x)/d
}


#The data
library(tidyverse)
data <- tibble(id = c(1:4),
               turn_intent_1 = c(5, 1, 1, 4),
               turn_intent_2 = c(5, 1, 1, 3),
               turn_intent_3R = c(5, 5, 1, 3))

#The code that is cumbersome but works

data %>%
  mutate(turn_intent_agg = my_rowmeans(turn_intent_1, turn_intent_2, turn_intent_3R))

#The output

# A tibble: 4 x 5
     id turn_intent_1 turn_intent_2 turn_intent_3R turn_intent_agg
  <int>         <dbl>         <dbl>          <dbl>           <dbl>
1     1             5             5              5            5   
2     2             1             1              5            2.33
3     3             1             1              1            1   
4     4             4             3              3            3.33

But this does not work:

#The code
data %>%
  mutate(turn_intent_agg = select(., contains("turn")) %>% 
           my_rowmeans())

#The output
Error in class1Def@contains[[class2]] : no such index at level 1

Of course, I can type each column, but this dataset has many columns. It'd be much easier to use these wrappers.

I need the output to look like the correct one shown that contains all columns (such as id).

Thank you!


Solution

  • I think that you can simplify it to:

    data %>%
     mutate(turn_intent_agg = rowMeans(select(., contains("turn"))))
    
         id turn_intent_1 turn_intent_2 turn_intent_3R turn_intent_agg
      <int>         <dbl>         <dbl>          <dbl>           <dbl>
    1     1             5             5              5            5   
    2     2             1             1              5            2.33
    3     3             1             1              1            1   
    4     4             4             3              3            3.33
    

    And you can indeed add also the na.rm = TRUE parameter:

    data %>%
     mutate(turn_intent_agg = rowMeans(select(., contains("turn")), na.rm = TRUE))