Search code examples
rsapplyweighted-average

R query - Is it possible to use "sapply" and the "weighted.mean" function together?


I've been using a code to run means for specific variable values (demographic breaks), however I now have data that has a weight variable and need to calculate weighted means. I've already been using a code to calculate sample means, and was wondering if it's possible to change change or adjust the function to calculate the weighted mean. Here is some code to generate sample data

df <- data.frame(gender=c(2,2,1,1,2,2,1,1,1,1,1,1,2,2,2,2,1,2,2,1),
                 agegroup=c(2,2,7,5,5,5,2,7,2,2,4,4,4,3,4,5,3,3,6,6),
                 attitude_1=c(4,3,4,4,4,4,4,4,5,2,5,5,5,4,3,2,3,4,2,4),
                 attitude_2=c(4,4,1,3,4,2,4,5,5,5,5,4,5,4,3,3,4,4,4,4),
                 attitude_3=c(2,2,1,1,3,2,5,1,4,2,2,2,3,3,4,1,4,1,3,1),
                 income=c(40794,74579,62809,47280,72056,57908,70784,96742,66629,117530,79547,54110,39569,111217,109146,56421,106206,28385,85830,71110),
                 weight=c(1.77,1.89,2.29,6.14,2.07,5.03,0.73,1.60,1.95,2.56,5.41,2.02,6.87,3.23,3.01,4.68,3.42,2.75,2.31,4.04))

So far I've been using this code to get sample means

assign("Gender_Profile_1", 
       data.frame(sapply(subset(df, gender==1), FUN = function(x) mean(x, na.rm = TRUE))))

> Gender_Profile_1
           sapply.subset.df..gender....1...FUN...function.x..mean.x..na.rm...TRUE..
gender                                                                        1.000
agegroup                                                                      4.200
attitude_1                                                                    4.000
attitude_2                                                                    4.000
attitude_3                                                                    2.300
income                                                                    77274.700
weight                                                                        3.016

As you can see it generates Gender_Profile_1 with the means for all variables. In my attempt to calculate the weighted mean, I've tried to change the "FUN=" part to this

assign("Gender_Profile_1", 
       data.frame(sapply(subset(df, gender==1), FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))))

I get the following error message

 Error in weighted.mean.default(x, w = weight, na.rm = TRUE) : 
  'x' and 'w' must have the same length 

I've been trying all kinds of permutations of df$weight and df$x, but nothing seems to work. Any help or ideas would be great. Many thanks


Solution

  • I think the main problem with your code is that you are calling the weights column inside the sapply loop, however, this column has not been subsetted (as df has). Thus, you could just subset the weights columns before the sapply and then loop using that subsetted weights.

    Using the code you posted:

    weight <- subset(df, gender==1)[,"weight"]
    #Exactly the same code you posted
    assign("Gender_Profile_2", 
           data.frame(sapply(subset(df, gender==1), FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))))
    

    Here is another solution using apply, that might be easier to implement:

    #Apply the desired function by columns
    apply(subset(df, gender==1), 2, FUN = function(x) mean(x, na.rm = TRUE))
    #Get the weights of the rows that have gender == 1
    weight <- subset(df, gender==1)[,7]
    #Apply the wighted mean function
    apply(subset(df[,-7], gender==1), 2, FUN = function(x) weighted.mean(x, w=weight,na.rm = TRUE))