Search code examples
raggregateweighted-average

Weighted mean using aggregated


Sorry for asking what might be a very basic question, but I am stuck in a conundrum and cannot seem to get out of it.

I have a code that looks like

Medicine  Biology  Business sex weights
0           1          0     1     0.5
0           0          1     0     1
1           0          0     1     05
0           1          0     0     0.33
0           0          1     0     0.33
1           0          0     1     1 
0           1          0     0     0.33
0           0          1     1     1
1           0          0     1     1

Where the first three are fields of study, and the fouth variable regards gender. Obviously with many more observations. What I want to get, is the mean level of the the field of study (medicine, biology, business) by the variable sex (so the mean for men and the mean for women). To do so, I have used the following code:

barplot_sex<-aggregate(x=df_dummies[,1:19] , by=list(df$sex),
                            FUN= function(x) mean(x)

Which works perfectly and gives me what I needed. My problem is that I need to use a weighted mean now, but I canno use

FUN= function(x) weighted.mean(x, weights)

as there are many more observations than fields of study.

The only alternative I managed to do was to edit(boxplot) and change the values manually, but then R doesn't save the changes. Plus, I am sure there must be a trivial way to do exactly what I need.

Any help would be greatly appreciated.

Bests, Gabriele


Solution

  • Using by.

    by(dat, dat$sex, function(x) sapply(x[, 1:3], weighted.mean, x[, "weights"]))
    # dat$sex: 0
    # Medicine   Biology  Business 
    # 0.0000000 0.3316583 0.6683417 
    # --------------------------------------------------------------------------------------- 
    # dat$sex: 1
    # Medicine    Biology   Business 
    # 0.82352941 0.05882353 0.11764706 
    

    Data:

    dat <- structure(list(Medicine = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L
    ), Biology = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), Business = c(0L, 
    1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), sex = c(1L, 0L, 1L, 0L, 0L, 
    1L, 0L, 1L, 1L), weights = c(0.5, 1, 5, 0.33, 0.33, 1, 0.33, 
    1, 1)), class = "data.frame", row.names = c(NA, -9L))