I have a large (ish) data frame and I want to use dplyr mutate function (or suitable alternative) to calculate the mean of selected columns.
For example, suppose I had a data frame as follows:
colnames(dall)
[1] "Code" "LA.Name" "LA_Name" "Jan.20" "Feb.20" "Mar.20" "Apr.20" "May.20" "Jun.20"
[10] "Jul.20" "Aug.20" "Sep.20" "Oct.20" "Nov.20" "Dec.20" "Jan.19" "Feb.19" "Mar.19"
[19] "Apr.19" "May.19" "Jun.19" "Jul.19" "Aug.19" "Sep.19" "Oct.19" "Nov.19" "Dec.19"
[28] "Jan.18" "Feb.18" "Mar.18" "Apr.18" "May.18" "Jun.18" "Jul.18" "Aug.18" "Sep.18"
[37] "Oct.18" "Nov.18" "Dec.18" "Jan.17" "Feb.17" "Mar.17" "Apr.17" "May.17" "Jun.17"
[46] "Jul.17" "Aug.17" "Sep.17" "Oct.17" "Nov.17" "Dec.17" "Jan.16" "Feb.16" "Mar.16"
[55] "Apr.16" "May.16" "Jun.16" "Jul.16" "Aug.16" "Sep.16" "Oct.16" "Nov.16" "Dec.16"
[64] "Jan.15" "Feb.15" "Mar.15" "Apr.15" "May.15" "Jun.15" "Jul.15" "Aug.15" "Sep.15"
[73] "Oct.15" "Nov.15" "Dec.15"
I'm trying to create a new column with the mean of January data from 2015 to 2019.
Have tried several methods. Latest as follows:
mutate(dall, mJan15to19 = mean(Jan.15,Jan.16,Jan.17,Jan.18,Jan.19))
I get the following back:
Error in mean.default(Jan.15, Jan.16, Jan.17, Jan.18, Jan.19) :
'trim' must be numeric of length one
In addition: Warning message:
In if (na.rm) x <- x[!is.na(x)] :
the condition has length > 1 and only the first element will be used
The content of the cells I'm trying to average is a numeric
Can you help?
UPDATE:
Tried:
head(dall) %>% mutate(new = rowMeans(select(., Jan.15:Jan.19)))
Returned the following:
head(dall) %>% mutate(new = rowMeans(select(., Jan.15:Jan.19)))
Code LA.Name LA_Name Jan.20 Feb.20 Mar.20 Apr.20 May.20 Jun.20
1 E06000001 Hartlepool Hartlepool 108 76 89 NA NA NA
2 E06000002 Middlesbrough Middlesbrough 178 98 135 NA NA NA
3 E06000003 Redcar and Cleveland Redcar and Cleveland 150 148 126 NA NA NA
4 E06000004 Stockton-on-Tees Stockton-on-Tees 202 124 175 NA NA NA
5 E06000005 Darlington Darlington 137 90 116 NA NA NA
6 E06000006 Halton Halton 141 101 115 NA NA NA
Jul.20 Aug.20 Sep.20 Oct.20 Nov.20 Dec.20 Jan.19 Feb.19 Mar.19 Apr.19 May.19 Jun.19 Jul.19 Aug.19
1 NA NA NA NA NA NA 92 87 68 81 108 77 97 73
2 NA NA NA NA NA NA 144 116 126 113 123 100 113 118
3 NA NA NA NA NA NA 146 152 133 135 114 101 140 116
4 NA NA NA NA NA NA 192 166 160 133 157 126 136 149
5 NA NA NA NA NA NA 138 110 104 84 115 75 86 104
6 NA NA NA NA NA NA 114 95 83 92 97 88 98 83
Sep.19 Oct.19 Nov.19 Dec.19 Jan.18 Feb.18 Mar.18 Apr.18 May.18 Jun.18 Jul.18 Aug.18 Sep.18 Oct.18
1 69 87 85 99 126 89 97 97 77 65 64 61 76 71
2 117 127 119 121 204 117 112 132 129 106 96 115 103 111
3 108 139 134 145 225 152 135 114 122 116 113 108 113 154
4 136 177 159 173 256 171 189 142 146 149 142 144 128 179
5 77 95 96 119 127 125 98 98 104 76 77 84 79 109
6 91 106 102 121 170 106 114 93 102 93 83 111 91 93
Nov.18 Dec.18 Jan.17 Feb.17 Mar.17 Apr.17 May.17 Jun.17 Jul.17 Aug.17 Sep.17 Oct.17 Nov.17 Dec.17
1 94 97 116 83 101 76 85 86 52 80 85 88 98 94
2 108 121 151 137 131 111 112 114 127 112 113 120 150 151
3 113 129 171 126 158 104 120 134 122 119 107 145 126 134
4 152 174 177 166 176 129 157 148 141 148 168 143 142 186
5 84 100 103 110 105 88 101 89 73 92 87 96 102 86
6 115 96 117 95 115 94 99 105 93 110 110 86 89 84
Jan.16 Feb.16 Mar.16 Apr.16 May.16 Jun.16 Jul.16 Aug.16 Sep.16 Oct.16 Nov.16 Dec.16 Jan.15 Feb.15
1 79 97 90 92 82 87 75 74 74 79 68 93 138 99
2 116 143 138 131 139 95 107 107 102 121 125 142 166 144
3 129 132 147 141 137 137 115 108 115 127 135 124 179 144
4 159 176 171 191 146 169 160 128 161 143 159 161 263 169
5 105 113 85 92 87 92 74 78 91 85 88 86 149 78
6 113 98 108 117 90 99 92 107 101 93 123 111 162 105
Mar.15 Apr.15 May.15 Jun.15 Jul.15 Aug.15 Sep.15 Oct.15 Nov.15 Dec.15 new
1 109 69 82 85 71 65 74 82 81 112 85.89796
2 130 116 127 124 119 104 107 95 115 101 123.51020
3 129 142 136 125 114 108 120 117 108 140 131.61224
4 155 163 127 129 142 101 161 148 140 180 161.30612
5 105 102 78 90 112 91 83 109 97 96 96.34694
6 100 102 99 90 90 81 102 98 86 107 103.02041
>
I have a new column, but the calculation is incorrect. I want an average of all of the 'Jan' columns except for 'Jan.20'
Since you wanted rowwise mean, this will work:
dall$mJan15to19 = rowMeans(dall[,c("Jan.15","Jan.16","Jan.17","Jan.18","Jan.19")])