Search code examples
rweighted-average

Determine the weighted mean of different columns in a data frame


I have a data frame Mesure and I wish to determine for each row, a weighted mean like this :

weighted_mean = ((mean_Mesure x nbr_Mesure) + (mean_DL x nbr_DL)) / (nbr_Mesure + nbr_DL)

I know there is a weighted.mean function but I failed to obtain a new column "weighted_mean"

And, is it an issue if each row does not necessary have the 4 values to obtain this formula (such as row 6 in Mesure) ?

> head(Mesure)
         Row.names         mean_Mesure nbr_Mesure  mean_DL    nbr_DL
2    Aquatic_moss.BE-7     123            4         542        12
3   Aquatic_moss.CO-57     100            7         117        14         
4   Aquatic_moss.CO-58     120            5         145        12           
5   Aquatic_moss.CO-60     140            5         153        12 
6  Aquatic_moss.CS-134                              146        15       


Solution

  • You can use the rowwise() function in the new dplyr:

    library(dplyr) # 1.0.0
    
    Mesure %>%
            rowwise() %>%
            mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
    
    # A tibble: 5 x 6
    # Rowwise: 
      Row.names           mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
      <chr>                     <dbl>      <dbl>   <dbl>  <dbl>         <dbl>
    1 Aquatic_moss.BE-7           123          4     542     12          437.
    2 Aquatic_moss.CO-57          100          7     117     14          111.
    3 Aquatic_moss.CO-58          120          5     145     12          138.
    4 Aquatic_moss.CO-60          140          5     153     12          149.
    5 Aquatic_moss.CS-134          NA         NA     146     15           NA 
    

    EDIT

    If we want to replace NAs with 0, then we can use the na_replace() function from tidyr:

    library(dplyr)
    library(tidyr) # 1.1.0
    
    Mesure %>%
            replace_na(list(mean_Mesure = 0,
                            nbr_Mesure = 0,
                            mean_DL = 0,
                            nbr_DL = 0)) %>%
            rowwise() %>%
            mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
    
    # A tibble: 5 x 6
    # Rowwise: 
      Row.names           mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
      <chr>                     <dbl>      <dbl>   <dbl>  <dbl>         <dbl>
    1 Aquatic_moss.BE-7           123          4     542     12          437.
    2 Aquatic_moss.CO-57          100          7     117     14          111.
    3 Aquatic_moss.CO-58          120          5     145     12          138.
    4 Aquatic_moss.CO-60          140          5     153     12          149.
    5 Aquatic_moss.CS-134           0          0     146     15          146 
    

    DATA

    Mesure <- structure(list(Row.names = c("Aquatic_moss.BE-7", "Aquatic_moss.CO-57", 
    "Aquatic_moss.CO-58", "Aquatic_moss.CO-60", "Aquatic_moss.CS-134"
    ), mean_Mesure = c(123, 100, 120, 140, NA), nbr_Mesure = c(4, 
    7, 5, 5, NA), mean_DL = c(542, 117, 145, 153, 146), nbr_DL = c(12, 
    14, 12, 12, 15)), row.names = c(NA, -5L), class = c("tbl_df", 
    "tbl", "data.frame"))