Search code examples
rdplyr

Dplyr - Mean for multiple columns


I want to calculate the mean for several columns and thus create a new column for the mean using dplyr and without melting + merging.

> head(growth2)
  CODE_COUNTRY CODE_PLOT IV12_ha_yr IV23_ha_yr IV34_ha_yr IV14_ha_yr IV24_ha_yr IV13_ha_yr
1            1         6       4.10       6.97         NA         NA         NA       4.58
2            1        17       9.88       8.75         NA         NA         NA       8.25
3            1        30         NA         NA         NA         NA         NA         NA
4            1        37      15.43      15.07      11.89      10.00      12.09      14.33
5            1        41      20.21      15.01      14.72      11.31      13.27      17.09
6            1        46      12.64      14.36      13.65       9.07      12.47      12.36
> 

I need a new column within the dataset with the mean of all the IV columns. I tried this:

growth2 %>% 
  group_by(CODE_COUNTRY, CODE_PLOT) %>%
  summarise(IVmean=mean(IV12_ha_yr:IV13_ha_yr, na.rm=TRUE))

And returned several errors depending on the example used, such as:

Error in NA_real_:NA_real_ : NA/NaN argument

or

Error in if (trim > 0 && n) { : missing value where TRUE/FALSE needed

Solution

  • You don't need to group, just select() and then mutate()

    library(dplyr)
    mutate(df, IVMean = rowMeans(select(df, starts_with("IV")), na.rm = TRUE))