Search code examples
rcountsumconditional-statements

sum and count value based on two criteria


I'm having trouble solving the following problem.

Geslacht               persondays age_cat contactfirst         
 1 V                          365 <40     2020            
 2 V                          365 <40     2019            
 3 V                          365 70-80   2019            
 4 V                          365 50-60   2019            
 5 V                          365 60-70   2020            
 6 M                          365 50-60   2020            
 7 V                          365 60-70   2019            
 8 M                           39 60-70   2019            
 9 V                          365 60-70   2019            
10 M                          365 70-80   2020            

df <- structure(list(Geslacht = c("V", "V", "V", "V", "V", "M", "V", 
"M", "V", "M", "M", "M", "V", "M", "M", "V", "V", "M", "V", "M", 
"V", "V", "M", "V", "M", "M", "M", "M", "M", "V", "M", "V", "M", 
"V", "M", "M", "V", "M", "M", "M", "M", "V", "M", "V", "M", "M", 
"M", "M", "M", "V"), persondays_individual = c(365, 365, 365, 
365, 365, 365, 365, 39, 365, 365, 365, 365, 365, 365, 365, 365, 
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 
365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 
365, 365, 365, 365, 365, 365, 365, 365), age_cat = structure(c(1L, 
1L, 6L, 4L, 5L, 4L, 5L, 5L, 5L, 6L, 1L, 5L, 4L, 6L, 5L, 7L, 6L, 
3L, 5L, 5L, 5L, 7L, 5L, 6L, 4L, 4L, 4L, 1L, 6L, 6L, 4L, 7L, 7L, 
4L, 3L, 4L, 5L, 5L, 1L, 4L, 6L, 6L, 5L, 5L, 4L, 3L, 7L, 5L, 5L, 
4L), .Label = c("<40", ">90", "40-50", "50-60", "60-70", "70-80", 
"80-90"), class = "factor"), contactfirst_cat = structure(c(11L, 
10L, 10L, 10L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L, 11L, 10L, 
11L, 10L, 10L, 10L, 10L, 10L, 11L, 10L, 11L, 11L, 11L, 10L, 11L, 
10L, 11L, 11L, 10L, 11L, 11L, 11L, 10L, 10L, 10L, 11L, 11L, 10L, 
10L, 11L, 11L, 11L, 10L, 10L, 10L, 10L, 11L, 11L), .Label = c("<2011", 
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", 
"2019", "2020"), class = "factor")), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))

I would like to create two new variables. One that sums the persons days per age_cat and Gender. And the second that counts the amount of the 'contactfirst' in a certain year (in this case 2019)

Desired output:

Gender       age_cat    persondays_total    contactfirst_total
  V            <40           730            1
  V            50-60         365            1
  V            60-70         1095           2
  V            70-80         365            1
  M            50-60         365            0
  M            60-70         39             1
  M            70-80         365            0

I've tried to do it with group_by (following code). But this counts all the persondays in the dataframe (so not per gender & age_category) and does not create the column newcontacts.

df2 <- df %>% group_by(Gender, age_cat) %>%
         mutate(personyears_total = sum(persondays_individual)) %>%
         mutate(newcontacts = nrow(df$contactfirst == "2019"

Any help would be much appreciated.


Solution

  • If you don't want to bother with packages, try aggregate.

    aggregate(cbind(persondays_individual, contactfirst_cat) ~ age_cat + Geslacht, 
              transform(df, contactfirst_cat=contactfirst_cat == 2019), sum)
    #    age_cat Geslacht persondays_individual contactfirst_cat
    # 1      <40        M                  1095                1
    # 2    40-50        M                  1095                3
    # 3    50-60        M                  2920                5
    # 4    60-70        M                  2959                4
    # 5    70-80        M                  1460                2
    # 6    80-90        M                   730                1
    # 7      <40        V                   730                1
    # 8    50-60        V                  1460                1
    # 9    60-70        V                  2555                4
    # 10   70-80        V                  1825                2
    # 11   80-90        V                  1095                2