So here is the sample data and the packages. The code that I am using is below. It works for the first four rows but after that things go awry. The desired result is at the very bottom. I need the cumsum to only look at the area, period combination... 001 and 2020q1. In this case, there will be 4 groupings (001/2020q1, 003/2020q1, 001/2020q2, 003/2020q2). How would I go about doing such a process? I have a feeling that I am missing something in the group by clause but going in circles as of yet.
This is a continuation of a previous question. This has more data and is a bit more involved.
library(readxl)
library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)
employment <- c(1,45,125,130,165,260,2,46,127,132,167,265,50,61,110,121,170,305,55,66,112,123,172,310)
small <- c(1,1,2,2,3,4,1,1,2,2,3,4,1,1,2,2,3,4,1,1,2,2,3,4)
area <-c(001,001,001,001,001,001,001,001,001,001,001,001,003,003,003,003,003,003,003,003,003,003,003,003)
year<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020)
qtr <-c(1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2)
smbtest <- data.frame(employment,small,area,year,qtr)
smbsummary2<-smbtest %>%
mutate(period = paste0(year,"q",qtr)) %>%
select(area,period,employment,small) %>%
group_by(area,period,small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites))
area period small employment worksites
001 2020q1 1 46 2
001 2020q1 2 303 4
001 2020q1 3 466 5
001 2020q1 4 726 6
003 2020q1 1 48 2
003 2020q1 2 307 4
003 2020q1 3 474 5
003 2020q1 4 739 6
001 2020q2 1 111 2
001 2020q2 2 342 4
001 2020q2 3 512 5
001 2020q1 4 817 6
and so on.
The .groups = 'drop'
removes all the groups, instead we need .groups = 'drop_last'
. Based on the expected output showed, it should be the 'small' columns that should be dropped. By default, the summarise
does the .groups = 'drop_last
and if we want to specify it to remove the warnings, it can be done
smbsummary2 <- smbtest %>%
mutate(period = paste0(year,"q",qtr)) %>%
select(area,period,employment,small) %>%
group_by(area,period,small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop_last') %>% mutate(employment = cumsum(employment),
worksites = cumsum(worksites))