I have a data set like this:
df = read.table(text=' location total year TR TY TU TJ
A 822400 2010 0.09 0.09 0.07 0.07
A 822400 2010 0.13 0.08 0.08 0.06
B 822400 2010 0.18 0.07 0.10 0.05
B 565000 2009 0.05 0.05 0.04 0.04
B 565000 2009 0.07 0.04 0.04 0.03
A 565000 2008 0.10 0.03 0.05 0.02',header=T)
I want to compute the total-weighted mean of the two locations, by year and by properties(TR,TY,TU or TJ) using a function. To this end I wrote this:
total.weighted.mean <- function(df, properties, years){
dff<-filter(df, year==years)
res<-dff%>%
group_by(location) %>%
mutate(wt = weighted.mean(total, properties))
print(res)
}
total.weighted.mean( df, properties = "TR", years = 2009:2010)
But I get this error in function:
Error in weighted.mean.default(total, properties) :
'x' and 'w' must have the same length
and when i compute it out of the function, I get this:
location total year TR TY TU TJ wt
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 822400 2010 0.13 0.08 0.08 0.06 732310
2 B 565000 2009 0.07 0.04 0.04 0.03 732310
Is it correct to get the same wt for each location as we have different total values for different locations?
The main issue is that you pass the weights variable as a string. To tell dplyr
that you mean the variable in your dataset you could e.g. make use of the .data
pronoun. Additionally when filtering for years you should use %in%
instead of ==
:
library(dplyr)
df = read.table(text=' location total year TR TY TU TJ
A 822400 2010 0.09 0.09 0.07 0.07
A 822400 2010 0.13 0.08 0.08 0.06
B 822400 2010 0.18 0.07 0.10 0.05
B 565000 2009 0.05 0.05 0.04 0.04
B 565000 2009 0.07 0.04 0.04 0.03
A 565000 2008 0.10 0.03 0.05 0.02',header=T)
total.weighted.mean <- function(df, properties, years) {
dff<-filter(df, year %in% years)
res<-dff%>%
group_by(location) %>%
mutate(wt = weighted.mean(total, .data[[properties]]))
res
}
total.weighted.mean( df, properties = "TR", years = 2009:2010)
#> # A tibble: 5 x 8
#> # Groups: location [2]
#> location total year TR TY TU TJ wt
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A 822400 2010 0.09 0.09 0.07 0.07 822400
#> 2 A 822400 2010 0.13 0.08 0.08 0.06 822400
#> 3 B 822400 2010 0.18 0.07 0.1 0.05 719440
#> 4 B 565000 2009 0.05 0.05 0.04 0.04 719440
#> 5 B 565000 2009 0.07 0.04 0.04 0.03 719440