how do I calculate the mean square of all 2019_Preston_STD,2019_Preston_V1,2019_Preston_V2 etc using the Value column, then the adjmth1, adjmth3 columns
structure(list(IDX = c("2019_Preston_STD", "2019_Preston_V1",
"2019_Preston_V2", "2019_Preston_V3", "2019_Preston_W1", "2019_Preston_W2"
), Value = c(3L, 2L, 3L, 2L, 3L, 5L), adjmth1 = c(2.87777777777778,
1.85555555555556, 2.01111111111111, 1.77777777777778, 3.62222222222222,
4.45555555555556), adjmth3 = c(2.9328763348507, 2.08651828334684,
2.80282946626847, 2.15028039284054, 2.68766916156347, 4.51425274916654
), adjmth13 = c(2.81065411262847, 1.82585524933201, 1.81394057737959,
1.40785681078568, 3.30989138378569, 4.7301083495049)), row.names = 29:34, class = "data.frame")
This task can be done in many ways, as shown in the link that @r2evans pointed out. My favorite one is dplyr
using summarize(across()
because to me its syntax is easy to understand and easy to apply to many columns. It also presents the resulted numbers in nice format.
For example, from iris
data I want to get the arithmetic mean
of Sepal.Length
, Petal.Length
, and Petal.Width
for each of species : setosa, versicolor, and virginica. Here is the head of the data:
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
And here is how to get the mean in each species:
iris %>% group_by(Species) %>%
summarize(across(c(Sepal.Length, Petal.Length, Petal.Width), mean))
# A tibble: 3 x 4
# Species Sepal.Length Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl>
# 1 setosa 5.01 1.46 0.246
# 2 versicolor 5.94 4.26 1.33
# 3 virginica 6.59 5.55 2.03
As for your task, first you need to define the function for the mean square (because its definition slightly varies in some references). Then, you apply it to your data frame using summarize(across())
.
For example, you define the mean square function as follows:
meansq <- function(x) sum((x-mean(x))^2)/(length(x)-1)
Note: This definition requires that length(x) doesn't equal 1, or otherwise NaN will be produced.
You can apply it to your data frame newdata
as follows:
newdata %>% group_by(IDX) %>%
summarize(across(c(Value, adjmth1, adjmth3), meansq)