I've been using a code to run means for specific variable values (demographic breaks), however I now have data that has a weight variable and need to calculate weighted means. I've already been using a code to calculate sample means, and was wondering if it's possible to change change or adjust the function to calculate the weighted mean. Here is some code to generate sample data
df <- data.frame(gender=c(2,2,1,1,2,2,1,1,1,1,1,1,2,2,2,2,1,2,2,1),
agegroup=c(2,2,7,5,5,5,2,7,2,2,4,4,4,3,4,5,3,3,6,6),
attitude_1=c(4,3,4,4,4,4,4,4,5,2,5,5,5,4,3,2,3,4,2,4),
attitude_2=c(4,4,1,3,4,2,4,5,5,5,5,4,5,4,3,3,4,4,4,4),
attitude_3=c(2,2,1,1,3,2,5,1,4,2,2,2,3,3,4,1,4,1,3,1),
income=c(40794,74579,62809,47280,72056,57908,70784,96742,66629,117530,79547,54110,39569,111217,109146,56421,106206,28385,85830,71110),
weight=c(1.77,1.89,2.29,6.14,2.07,5.03,0.73,1.60,1.95,2.56,5.41,2.02,6.87,3.23,3.01,4.68,3.42,2.75,2.31,4.04))
So far I've been using this code to get sample means
assign("Gender_Profile_1",
data.frame(sapply(subset(df, gender==1), FUN = function(x) mean(x, na.rm = TRUE))))
> Gender_Profile_1
sapply.subset.df..gender....1...FUN...function.x..mean.x..na.rm...TRUE..
gender 1.000
agegroup 4.200
attitude_1 4.000
attitude_2 4.000
attitude_3 2.300
income 77274.700
weight 3.016
As you can see it generates Gender_Profile_1 with the means for all variables. In my attempt to calculate the weighted mean, I've tried to change the "FUN=" part to this
assign("Gender_Profile_1",
data.frame(sapply(subset(df, gender==1), FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))))
I get the following error message
Error in weighted.mean.default(x, w = weight, na.rm = TRUE) :
'x' and 'w' must have the same length
I've been trying all kinds of permutations of df$weight and df$x, but nothing seems to work. Any help or ideas would be great. Many thanks
I think the main problem with your code is that you are calling the weights column inside the sapply loop, however, this column has not been subsetted (as df has). Thus, you could just subset the weights columns before the sapply and then loop using that subsetted weights.
Using the code you posted:
weight <- subset(df, gender==1)[,"weight"]
#Exactly the same code you posted
assign("Gender_Profile_2",
data.frame(sapply(subset(df, gender==1), FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))))
Here is another solution using apply, that might be easier to implement:
#Apply the desired function by columns
apply(subset(df, gender==1), 2, FUN = function(x) mean(x, na.rm = TRUE))
#Get the weights of the rows that have gender == 1
weight <- subset(df, gender==1)[,7]
#Apply the wighted mean function
apply(subset(df[,-7], gender==1), 2, FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))