Search code examples
rmeandplyr

Calculating new column as mean of selected columns in R data frame


I have a large (ish) data frame and I want to use dplyr mutate function (or suitable alternative) to calculate the mean of selected columns.

For example, suppose I had a data frame as follows:

colnames(dall)
[1] "Code"    "LA.Name" "LA_Name" "Jan.20"  "Feb.20"  "Mar.20"  "Apr.20"  "May.20"  "Jun.20" 
[10] "Jul.20"  "Aug.20"  "Sep.20"  "Oct.20"  "Nov.20"  "Dec.20"  "Jan.19"  "Feb.19"  "Mar.19" 
[19] "Apr.19"  "May.19"  "Jun.19"  "Jul.19"  "Aug.19"  "Sep.19"  "Oct.19"  "Nov.19"  "Dec.19" 
[28] "Jan.18"  "Feb.18"  "Mar.18"  "Apr.18"  "May.18"  "Jun.18"  "Jul.18"  "Aug.18"  "Sep.18" 
[37] "Oct.18"  "Nov.18"  "Dec.18"  "Jan.17"  "Feb.17"  "Mar.17"  "Apr.17"  "May.17"  "Jun.17" 
[46] "Jul.17"  "Aug.17"  "Sep.17"  "Oct.17"  "Nov.17"  "Dec.17"  "Jan.16"  "Feb.16"  "Mar.16" 
[55] "Apr.16"  "May.16"  "Jun.16"  "Jul.16"  "Aug.16"  "Sep.16"  "Oct.16"  "Nov.16"  "Dec.16" 
[64] "Jan.15"  "Feb.15"  "Mar.15"  "Apr.15"  "May.15"  "Jun.15"  "Jul.15"  "Aug.15"  "Sep.15" 
[73] "Oct.15"  "Nov.15"  "Dec.15" 

I'm trying to create a new column with the mean of January data from 2015 to 2019.

Have tried several methods. Latest as follows:

mutate(dall, mJan15to19 = mean(Jan.15,Jan.16,Jan.17,Jan.18,Jan.19))

I get the following back:

Error in mean.default(Jan.15, Jan.16, Jan.17, Jan.18, Jan.19) : 
  'trim' must be numeric of length one
In addition: Warning message:
In if (na.rm) x <- x[!is.na(x)] :
  the condition has length > 1 and only the first element will be used

The content of the cells I'm trying to average is a numeric

Can you help?

UPDATE:

Tried:

head(dall) %>% mutate(new = rowMeans(select(., Jan.15:Jan.19)))

Returned the following:

 head(dall) %>% mutate(new = rowMeans(select(., Jan.15:Jan.19)))
       Code              LA.Name              LA_Name Jan.20 Feb.20 Mar.20 Apr.20 May.20 Jun.20
1 E06000001          Hartlepool            Hartlepool    108     76     89     NA     NA     NA
2 E06000002        Middlesbrough        Middlesbrough    178     98    135     NA     NA     NA
3 E06000003 Redcar and Cleveland Redcar and Cleveland    150    148    126     NA     NA     NA
4 E06000004     Stockton-on-Tees     Stockton-on-Tees    202    124    175     NA     NA     NA
5 E06000005           Darlington           Darlington    137     90    116     NA     NA     NA
6 E06000006              Halton                Halton    141    101    115     NA     NA     NA
  Jul.20 Aug.20 Sep.20 Oct.20 Nov.20 Dec.20 Jan.19 Feb.19 Mar.19 Apr.19 May.19 Jun.19 Jul.19 Aug.19
1     NA     NA     NA     NA     NA     NA     92     87     68     81    108     77     97     73
2     NA     NA     NA     NA     NA     NA    144    116    126    113    123    100    113    118
3     NA     NA     NA     NA     NA     NA    146    152    133    135    114    101    140    116
4     NA     NA     NA     NA     NA     NA    192    166    160    133    157    126    136    149
5     NA     NA     NA     NA     NA     NA    138    110    104     84    115     75     86    104
6     NA     NA     NA     NA     NA     NA    114     95     83     92     97     88     98     83
  Sep.19 Oct.19 Nov.19 Dec.19 Jan.18 Feb.18 Mar.18 Apr.18 May.18 Jun.18 Jul.18 Aug.18 Sep.18 Oct.18
1     69     87     85     99    126     89     97     97     77     65     64     61     76     71
2    117    127    119    121    204    117    112    132    129    106     96    115    103    111
3    108    139    134    145    225    152    135    114    122    116    113    108    113    154
4    136    177    159    173    256    171    189    142    146    149    142    144    128    179
5     77     95     96    119    127    125     98     98    104     76     77     84     79    109
6     91    106    102    121    170    106    114     93    102     93     83    111     91     93
  Nov.18 Dec.18 Jan.17 Feb.17 Mar.17 Apr.17 May.17 Jun.17 Jul.17 Aug.17 Sep.17 Oct.17 Nov.17 Dec.17
1     94     97    116     83    101     76     85     86     52     80     85     88     98     94
2    108    121    151    137    131    111    112    114    127    112    113    120    150    151
3    113    129    171    126    158    104    120    134    122    119    107    145    126    134
4    152    174    177    166    176    129    157    148    141    148    168    143    142    186
5     84    100    103    110    105     88    101     89     73     92     87     96    102     86
6    115     96    117     95    115     94     99    105     93    110    110     86     89     84
  Jan.16 Feb.16 Mar.16 Apr.16 May.16 Jun.16 Jul.16 Aug.16 Sep.16 Oct.16 Nov.16 Dec.16 Jan.15 Feb.15
1     79     97     90     92     82     87     75     74     74     79     68     93    138     99
2    116    143    138    131    139     95    107    107    102    121    125    142    166    144
3    129    132    147    141    137    137    115    108    115    127    135    124    179    144
4    159    176    171    191    146    169    160    128    161    143    159    161    263    169
5    105    113     85     92     87     92     74     78     91     85     88     86    149     78
6    113     98    108    117     90     99     92    107    101     93    123    111    162    105
  Mar.15 Apr.15 May.15 Jun.15 Jul.15 Aug.15 Sep.15 Oct.15 Nov.15 Dec.15       new
1    109     69     82     85     71     65     74     82     81    112  85.89796
2    130    116    127    124    119    104    107     95    115    101 123.51020
3    129    142    136    125    114    108    120    117    108    140 131.61224
4    155    163    127    129    142    101    161    148    140    180 161.30612
5    105    102     78     90    112     91     83    109     97     96  96.34694
6    100    102     99     90     90     81    102     98     86    107 103.02041
>

I have a new column, but the calculation is incorrect. I want an average of all of the 'Jan' columns except for 'Jan.20'


Solution

  • Since you wanted rowwise mean, this will work:

    dall$mJan15to19 = rowMeans(dall[,c("Jan.15","Jan.16","Jan.17","Jan.18","Jan.19")])