I'm having trouble solving the following problem.
Geslacht persondays age_cat contactfirst
1 V 365 <40 2020
2 V 365 <40 2019
3 V 365 70-80 2019
4 V 365 50-60 2019
5 V 365 60-70 2020
6 M 365 50-60 2020
7 V 365 60-70 2019
8 M 39 60-70 2019
9 V 365 60-70 2019
10 M 365 70-80 2020
df <- structure(list(Geslacht = c("V", "V", "V", "V", "V", "M", "V",
"M", "V", "M", "M", "M", "V", "M", "M", "V", "V", "M", "V", "M",
"V", "V", "M", "V", "M", "M", "M", "M", "M", "V", "M", "V", "M",
"V", "M", "M", "V", "M", "M", "M", "M", "V", "M", "V", "M", "M",
"M", "M", "M", "V"), persondays_individual = c(365, 365, 365,
365, 365, 365, 365, 39, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365,
365, 365, 365, 365, 365, 365, 365, 365), age_cat = structure(c(1L,
1L, 6L, 4L, 5L, 4L, 5L, 5L, 5L, 6L, 1L, 5L, 4L, 6L, 5L, 7L, 6L,
3L, 5L, 5L, 5L, 7L, 5L, 6L, 4L, 4L, 4L, 1L, 6L, 6L, 4L, 7L, 7L,
4L, 3L, 4L, 5L, 5L, 1L, 4L, 6L, 6L, 5L, 5L, 4L, 3L, 7L, 5L, 5L,
4L), .Label = c("<40", ">90", "40-50", "50-60", "60-70", "70-80",
"80-90"), class = "factor"), contactfirst_cat = structure(c(11L,
10L, 10L, 10L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L, 11L, 10L,
11L, 10L, 10L, 10L, 10L, 10L, 11L, 10L, 11L, 11L, 11L, 10L, 11L,
10L, 11L, 11L, 10L, 11L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L,
10L, 11L, 11L, 11L, 10L, 10L, 10L, 10L, 11L, 11L), .Label = c("<2011",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "2020"), class = "factor")), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
I would like to create two new variables. One that sums the persons days per age_cat and Gender. And the second that counts the amount of the 'contactfirst' in a certain year (in this case 2019)
Desired output:
Gender age_cat persondays_total contactfirst_total
V <40 730 1
V 50-60 365 1
V 60-70 1095 2
V 70-80 365 1
M 50-60 365 0
M 60-70 39 1
M 70-80 365 0
I've tried to do it with group_by
(following code). But this counts all the persondays in the dataframe (so not per gender & age_category) and does not create the column newcontacts.
df2 <- df %>% group_by(Gender, age_cat) %>%
mutate(personyears_total = sum(persondays_individual)) %>%
mutate(newcontacts = nrow(df$contactfirst == "2019"
Any help would be much appreciated.
If you don't want to bother with packages, try aggregate
.
aggregate(cbind(persondays_individual, contactfirst_cat) ~ age_cat + Geslacht,
transform(df, contactfirst_cat=contactfirst_cat == 2019), sum)
# age_cat Geslacht persondays_individual contactfirst_cat
# 1 <40 M 1095 1
# 2 40-50 M 1095 3
# 3 50-60 M 2920 5
# 4 60-70 M 2959 4
# 5 70-80 M 1460 2
# 6 80-90 M 730 1
# 7 <40 V 730 1
# 8 50-60 V 1460 1
# 9 60-70 V 2555 4
# 10 70-80 V 1825 2
# 11 80-90 V 1095 2