Search code examples
rdplyrsummarize

Issues with combining a case when into a cumsum calculation in R


Below is the sample data and my attempt at this. My primary question is how I would get the smbsummary3 data frame to show values of small = 2, 3, or 4 when they do not exist in the source data. My summarise section calculates correctly. Do I need to add a case_when statement prior to the summarise or treat the lack of 2,3,4 as NA?

emp <- c(1,2,3,4,5,6,7,8,1,12,54,101,33,159,201,261)
small <- c(1,1,1,1,1,1,1,1,1,1,1,2,1,3,3,4)
area <-c(001,001,001,001,001,001,001,001,003,003,003,003,003,003,003,003)

smbtest2 <- data.frame(emp,small,area)

 smbtest2 <- smbtest2 %>% mutate(smb = case_when(emp >=0 & emp <100 ~ "1",emp >=0 & emp <150 ~ "2",emp >=0 & emp <250 ~ "3", emp >=0 & emp <500 ~ "4",emp >=0 & emp <1000000 ~ "Total"))

smbsummary3<-smbtest2 %>% 
group_by(area,small) %>%
summarise(emp = sum(emp), worksites = n(), 
        .groups = 'drop_last') %>% 
mutate(emp = cumsum(emp),
     worksites = cumsum(worksites)) 

Schema for the small variable.

  Emp               smb
  0 to 100           1
  0 to 150           2
  0 to 250           3
  0 to 500           4

Desired result

  area       small    emp   worksites
  001          1       36      8
  001          2       36      8
  001          3       36      8
  001          4       36      8
  003          1       100     4
  003          2       201     5
  003          3       561     7
  003          4       822     8

Solution

  • If we need the missing combinations, do a complete

    library(dplyr)
    library(tidyr)
    smbsummary3 %>% 
        ungroup %>% 
        complete(area, small = unique(small)) %>% 
        fill(emp, worksites)
    

    -output

    # A tibble: 8 x 4
    #   area small   emp worksites
    #  <dbl> <dbl> <dbl>     <int>
    #1     1     1    36         8
    #2     1     2    36         8
    #3     1     3    36         8
    #4     1     4    36         8
    #5     3     1   100         4
    #6     3     2   201         5
    #7     3     3   561         7
    #8     3     4   822         8