Search code examples
rcutsummarize

how to built a new data with summarize and cut command for each age group


I want to build new data (age_summary) with a total number of people by age group. I would like to use "cut" and My codes are:

set.seed(12345)

#create a numeric variable Age       
AGE <- sample(0:110, 100, replace = TRUE)

# Creat Data fame
Sample.data <-data.frame(AGE)

age_summary <- Sample.data %>%  summarize(group_by(Sample.data,
                                                   cut(
                                                     AGE,
                                                     breaks=c(0,0.001, 0.083, 2, 13, 65,1000),
                                                     right=TRUE,
                                                     labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
  ),"Total people" = n())
)

However my codes do not work. I am not sure what went wrong. Any suggestion on how to solve this?

Add: I was able to get results that look like this:

enter image description here

is it possible for me to achieve something looks like this: enter image description here

Here is what I get with adorn_totals(.) on a new data set. the total people looks OK, but the ave-age looks strange.

enter image description here

Any idea?


Solution

  • If we remove the summarise wrapping around the group_by, we can find the issue more easily. Here, the cut labels and breaks have different lengths, which can be changed if we add -Inf or Inf in breaks

    library(dplyr)
    Sample.data %>% 
          group_by(grp =  cut(AGE,
                                  breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000),
                                  right=TRUE,
                                  labels = c("Foetus(0 yr)",
         "Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)",
                       "Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
       )) %>% 
         summarise(TotalPeople = n())
    

    If we need to create a row with different functions applied on different columns, use add_row

    library(tibble)
    library(tidyr)
    Sample.data %>% 
        group_by(grp = cut( AGE, breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000), 
            right=TRUE, labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", 
              "Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)") )) %>% 
        summarise(TotalPeople = n(), Ave_age=mean(AGE))%>%
        complete(grp = levels(grp), fill = list(TotalPeople = 0)) %>% 
        add_row(grp = "Total", TotalPeople = sum(.$TotalPeople),
                    Ave_age = mean(.$Ave_age, na.rm = TRUE))