Search code examples
rdataframeaggregate

Getting the sum of selected rows in a data frame in R


I'm creating a population age pyramid for my metro area from US Census data. Here's some sample data:

Sex = rep(c("male", "female"), each = 23)
Age_group = as.factor(rep(c(0, 5, 10, 15, 18, 20, 21, 22, 25, 30, 35, 40, 45, 50, 55, 60, 62, 65, 67, 70, 75, 80, 85), length = 46))
pop = c(-24684, -24946, -26465, -16103, -11431, -6233, -6071, -14838, -27420, -26246, -24612, -22753, -23036, -24870, -27676, -11400, -14267, -9493, -12132, -17456, -10913, -7241, -5836, 23322, 23225, 25521, 15128, 11388, 5858, 5300, 15385, 27610, 26368, 25329, 23045, 24025, 25847, 28077, 12419, 16241, 10000, 14411, 20807, 14309, 10216, 11125)

pop_pyramid <- data.frame(Sex, Age_group, pop)

My issue: While most of the data is in 5-year age groups, the 15, 18, 20, 21, 22, 60, 62, 65, and 67 age groups for each sex are not. I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups the 60 age group, and the 65 and 67 age groups into the 65 age group.

Simply adding the rows I want fails as Age_group and Sex are factors; rowSums also fails for the same reason. I thought of aggregation but only need those select rows aggregated, not the entire data frame. Is there a way to do this without resorting to doing it by hand?

Thanks.


Solution

  • I would suggest to remove the modulus of 5 from Age_group then you can continue with the calculations:

    pop_pyramid$Age_group2 <- as.numeric(as.character(Age_group)) -
      as.numeric(as.character(Age_group))%%5
    pop_pyramid
    
          Sex Age_group    pop Age_group2
    1    male         0 -24684          0
    2    male         5 -24946          5
    3    male        10 -26465         10
    4    male        15 -16103         15
    5    male        18 -11431         15
    6    male        20  -6233         20
    7    male        21  -6071         20
    8    male        22 -14838         20
    9    male        25 -27420         25
    10   male        30 -26246         30
    11   male        35 -24612         35
    12   male        40 -22753         40
    13   male        45 -23036         45
    14   male        50 -24870         50
    15   male        55 -27676         55
    16   male        60 -11400         60
    17   male        62 -14267         60
    18   male        65  -9493         65
    19   male        67 -12132         65
    20   male        70 -17456         70
    21   male        75 -10913         75
    22   male        80  -7241         80
    23   male        85  -5836         85
    24 female         0  23322          0
    25 female         5  23225          5
    26 female        10  25521         10
    27 female        15  15128         15
    28 female        18  11388         15
    29 female        20   5858         20
    30 female        21   5300         20
    31 female        22  15385         20