Search code examples
rdplyrweighted-average

error "'x' and 'w' must have the same length" in computing weighted mean within a function


I have a data set like this:

df = read.table(text='    location total year TR  TY  TU TJ
  A     822400 2010 0.09 0.09 0.07    0.07
  A     822400 2010 0.13 0.08 0.08    0.06
  B     822400 2010 0.18 0.07 0.10    0.05
  B     565000 2009 0.05 0.05 0.04    0.04
  B     565000 2009 0.07 0.04 0.04    0.03
  A     565000 2008 0.10 0.03 0.05    0.02',header=T)

I want to compute the total-weighted mean of the two locations, by year and by properties(TR,TY,TU or TJ) using a function. To this end I wrote this:

total.weighted.mean <- function(df, properties, years){
  
  dff<-filter(df, year==years)
  
  res<-dff%>%
    group_by(location) %>% 
    mutate(wt = weighted.mean(total, properties))
  
  print(res)
  
}

total.weighted.mean( df, properties = "TR", years = 2009:2010)

But I get this error in function:

Error in weighted.mean.default(total, properties) : 
  'x' and 'w' must have the same length 

and when i compute it out of the function, I get this:

  location  total  year    TR    TY    TU    TJ     wt
  <chr>     <int> <int> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 A        822400  2010  0.13  0.08  0.08  0.06 732310
2 B        565000  2009  0.07  0.04  0.04  0.03 732310

Is it correct to get the same wt for each location as we have different total values for different locations?


Solution

  • The main issue is that you pass the weights variable as a string. To tell dplyr that you mean the variable in your dataset you could e.g. make use of the .data pronoun. Additionally when filtering for years you should use %in% instead of ==:

    library(dplyr)
    
    df = read.table(text='    location total year TR  TY  TU TJ
      A     822400 2010 0.09 0.09 0.07    0.07
      A     822400 2010 0.13 0.08 0.08    0.06
      B     822400 2010 0.18 0.07 0.10    0.05
      B     565000 2009 0.05 0.05 0.04    0.04
      B     565000 2009 0.07 0.04 0.04    0.03
      A     565000 2008 0.10 0.03 0.05    0.02',header=T)
    
    total.weighted.mean <- function(df, properties, years) {
      
      dff<-filter(df, year %in% years)
      
      res<-dff%>%
        group_by(location) %>% 
        mutate(wt = weighted.mean(total, .data[[properties]]))
      
      res
      
    }
    
    total.weighted.mean( df, properties = "TR", years = 2009:2010)
    #> # A tibble: 5 x 8
    #> # Groups:   location [2]
    #>   location  total  year    TR    TY    TU    TJ     wt
    #>   <chr>     <int> <int> <dbl> <dbl> <dbl> <dbl>  <dbl>
    #> 1 A        822400  2010  0.09  0.09  0.07  0.07 822400
    #> 2 A        822400  2010  0.13  0.08  0.08  0.06 822400
    #> 3 B        822400  2010  0.18  0.07  0.1   0.05 719440
    #> 4 B        565000  2009  0.05  0.05  0.04  0.04 719440
    #> 5 B        565000  2009  0.07  0.04  0.04  0.03 719440