Search code examples
rweighted

R - Weighted sum


I have a dataframe with multiple asnwers from a sort of census. I want to summ the number of people that actually lives in certain places, and to do so i need to calculate a weighted variable too - I can't just sum all the number of people that the table shows.

  ZONA   ID_DOM   FE_DOM NO_MORAD
1    1 00010001 15.41667        2
2    1 00010001 15.41667        2
3    1 00010001 15.41667        2
4    1 00010001 15.41667        2
5    1 00010001 15.41667        2
6    1 00010002 15.41667        4

Saying it again, I want to get the sum of NO_Morad by ZONA, counting only once each of the ID_DOM. All that weighted by FE_DOM.

to just count the number of ID_DOMs I used

Zona <- count(OD_2017[!duplicated(OD_2017$ID_DOM),], wt = FE_DOM, Zonas=ZONA, name = "N_domicilios")

but now i don't know how to do so. I was trying something like

Zona <- OD_2017 %>%
  group_by(ZONA) %>%
  summarise(ID_DOM = n_distinct(ID_DOM), weights(FE_DOM))

but it didnt worked out.

Any tips?

Thanks


Solution

  • I see pipes in your attempts, but here is one approach using data.table.

    Data:

    df <- structure(list(ZONA = c(1, 1, 1, 1, 1, 1), ID_DOM = c("00010001", 
    "00010001", "00010001", "00010001", "00010001", "00010002"), FE_DOM = c(15.41667, 15.41667, 
    15.41667, 15.41667, 15.41667, 15.41667), NO_MORAD = c(2, 2, 2, 
    2, 2, 4)), class = "data.frame", row.names = c(NA, -6L))
    

    Code:

    library(data.table)
    dt <- as.data.table(df)
    dt[,unique(.SD)[,.(WeightedSum = sum(FE_DOM * NO_MORAD))],by="ZONA"]
    

    Output:

       ZONA WeightedSum
    1:    1    92.50002