I have a dataframe with multiple asnwers from a sort of census. I want to summ the number of people that actually lives in certain places, and to do so i need to calculate a weighted variable too - I can't just sum all the number of people that the table shows.
ZONA ID_DOM FE_DOM NO_MORAD
1 1 00010001 15.41667 2
2 1 00010001 15.41667 2
3 1 00010001 15.41667 2
4 1 00010001 15.41667 2
5 1 00010001 15.41667 2
6 1 00010002 15.41667 4
Saying it again, I want to get the sum of NO_Morad
by ZONA
, counting only once each of the ID_DOM
. All that weighted by FE_DOM
.
to just count the number of ID_DOM
s I used
Zona <- count(OD_2017[!duplicated(OD_2017$ID_DOM),], wt = FE_DOM, Zonas=ZONA, name = "N_domicilios")
but now i don't know how to do so. I was trying something like
Zona <- OD_2017 %>%
group_by(ZONA) %>%
summarise(ID_DOM = n_distinct(ID_DOM), weights(FE_DOM))
but it didnt worked out.
Any tips?
Thanks
I see pipes in your attempts, but here is one approach using data.table.
Data:
df <- structure(list(ZONA = c(1, 1, 1, 1, 1, 1), ID_DOM = c("00010001",
"00010001", "00010001", "00010001", "00010001", "00010002"), FE_DOM = c(15.41667, 15.41667,
15.41667, 15.41667, 15.41667, 15.41667), NO_MORAD = c(2, 2, 2,
2, 2, 4)), class = "data.frame", row.names = c(NA, -6L))
Code:
library(data.table)
dt <- as.data.table(df)
dt[,unique(.SD)[,.(WeightedSum = sum(FE_DOM * NO_MORAD))],by="ZONA"]
Output:
ZONA WeightedSum
1: 1 92.50002