Search code examples
rdataframeweighted-average

How can I calculate multiple weighted means in each column?


strong textI am trying to find a way to calculate weighted means for multiple years for each country. My data is currently formatted like:

df <- data.frame(
  YEAR_CALENDAR=c(2020, 2020, 2020, 2020, 2020, 2021, 2021, 2021, 2021, 2021),
  age_group=c("15-29", "30-44", "45-59", "60-74", "Over 75", "15-29", "30-44", "45-59", "60-74", "Over 75"),
  a=c(3.85, 3.66, 3.76, 2.70, 3.10, 4.32, 4.64, 3.67, 3.45, 4.56),
  b=c(3.56, 3.67, 3.72, 3.89, 4.23, 4.28, 4.27, 3.12, 3.46, 3.97),
  weights=rep(c(0.3333784699, 0.2890995261, 0.2161137441,
                  0.1203791469, 0.04150304671), 2))

I would like to have a data frame with the weighted average of Country a and and Country b in each year, using the coefficients in the weights column. The resulting data frame would have columns for Year, Country a, Country b, with the values being the weighted average of that country in that year.

My approach was to multiply the weights column to the a column and b column. Then, I would try to add the values of 5 rows together (for each year) to find the weighted average. In the first step of multiplying the weights, I am getting a "non-numeric argument to binary operator" error. The code is below:

new_df <- df %>%
  mutate(across(3:4, .*df$weights))

I was in need of a fix for my approach, or for someone to help me get the results through a different method.

EDIT: Okay, the multiplication part works! Thank you for the help. I was wondering if there was a way to add the values in each age_group based on the country and year, so that I have one value for each country in each year.

Many thanks!


Solution

  • You can simplify things by using the base R weighted.mean function inside summarize

    df %>%
      group_by(YEAR_CALENDAR) %>%
      summarise(across(a:b, ~ weighted.mean(.x, weights)))
    #> # A tibble: 2 x 3
    #>   YEAR_CALENDAR     a     b
    #>           <dbl> <dbl> <dbl>
    #> 1          2020  3.61  3.69
    #> 2          2021  4.18  3.92