I have a data frame Mesure
and I wish to determine for each row, a weighted mean like this :
weighted_mean = ((mean_Mesure x nbr_Mesure) + (mean_DL x nbr_DL)) / (nbr_Mesure + nbr_DL)
I know there is a weighted.mean
function but I failed to obtain a new column "weighted_mean"
And, is it an issue if each row does not necessary have the 4 values to obtain this formula (such as row 6 in Mesure
) ?
> head(Mesure)
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL
2 Aquatic_moss.BE-7 123 4 542 12
3 Aquatic_moss.CO-57 100 7 117 14
4 Aquatic_moss.CO-58 120 5 145 12
5 Aquatic_moss.CO-60 140 5 153 12
6 Aquatic_moss.CS-134 146 15
You can use the rowwise()
function in the new dplyr
:
library(dplyr) # 1.0.0
Mesure %>%
rowwise() %>%
mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
# A tibble: 5 x 6
# Rowwise:
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aquatic_moss.BE-7 123 4 542 12 437.
2 Aquatic_moss.CO-57 100 7 117 14 111.
3 Aquatic_moss.CO-58 120 5 145 12 138.
4 Aquatic_moss.CO-60 140 5 153 12 149.
5 Aquatic_moss.CS-134 NA NA 146 15 NA
EDIT
If we want to replace NA
s with 0, then we can use the na_replace()
function from tidyr
:
library(dplyr)
library(tidyr) # 1.1.0
Mesure %>%
replace_na(list(mean_Mesure = 0,
nbr_Mesure = 0,
mean_DL = 0,
nbr_DL = 0)) %>%
rowwise() %>%
mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
# A tibble: 5 x 6
# Rowwise:
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aquatic_moss.BE-7 123 4 542 12 437.
2 Aquatic_moss.CO-57 100 7 117 14 111.
3 Aquatic_moss.CO-58 120 5 145 12 138.
4 Aquatic_moss.CO-60 140 5 153 12 149.
5 Aquatic_moss.CS-134 0 0 146 15 146
DATA
Mesure <- structure(list(Row.names = c("Aquatic_moss.BE-7", "Aquatic_moss.CO-57",
"Aquatic_moss.CO-58", "Aquatic_moss.CO-60", "Aquatic_moss.CS-134"
), mean_Mesure = c(123, 100, 120, 140, NA), nbr_Mesure = c(4,
7, 5, 5, NA), mean_DL = c(542, 117, 145, 153, 146), nbr_DL = c(12,
14, 12, 12, 15)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))