Search code examples
rsumaggregation

Sum row values and create new category


I'm working with a population dataframe and I have information for different years and age groups divided by bins of five years. Once I filtered the information for the location I'm interested in I have this:

Location    age group   total90  total95  total00  total05  total10
  A          0 to 4      10428    118902     76758   967938   205472
  A          5 to 9      18530    238928    260331   277635   303180    
  A         10 to 14    180428    208902    226758   267938   305472
  A         15 to 19    185003    332089    242267   261793   135472

Now what I want is to create new age groups to have something like this:

Location    age group       total90  total95    total00  total05    total10
  A          5 to 14        198958   447830     487089    545573    608652
  A           other         195431   450991     319025   1229731    340944   

where

age group "5 to 14" is the total of "5 to 9" + "10 to 14" for each year &

"other" is the total of "0 to 4" + "15 to 19" for each year

I tried selecting the columns with numbers so I could add the totals for each age group and create a row with the new age group but I can't add rows in an easy way and I'm complicating things more. I'm sure there is an easy way to solve this but I'm stuck.


Solution

  • I had to alter your dummy data slightly (juste removed some space to make reading from plain text easy)to make it work without further manipulation

    df <- data.table::fread("Location    age_group   total90  total95  total00  total05  total10
      A          0_to_4      10428    118902     76758   967938   205472
      A          5_to_9      18530    238928    260331   277635   303180    
      A         10_to_14    180428    208902    226758   267938   305472
      A         15_to_19    185003    332089    242267   261793   135472")
    
    library(tidyverse)
    
    df %>%
        # alter the character variable age_group reducing problem to one ifelse clause
        dplyr::mutate(age_group = ifelse(age_group == "5_to_9" | age_group == "10_to_14", "5_to_14", "other")) %>%
        # build grouping (I included Location but possibly your need is diferent)
        dplyr::group_by(Location, age_group) %>%
        # sum in one call all not grouped columns (therefore you have to remove Location in case you do not want it in the grouping
        dplyr::summarize(across(everything(), ~sum(.x))) %>%
        # ungrouping prevents unwanted behaviour down stream
        dplyr::ungroup()
    
    # A tibble: 2 x 7
      Location age_group total90 total95 total00 total05 total10
      <chr>    <chr>       <int>   <int>   <int>   <int>   <int>
    1 A        5_to_14    198958  447830  487089  545573  608652
    2 A        other      195431  450991  319025 1229731  340944