So for this task, in my real data set. I have 18 rows that are indcode = 000000 and ownership code = 10. The differentiating factor is area. Likewise, I have 18 rows that are indcode = 4911 and ownership code = 10. Sample data below narrows it to 4 for easy computing. Some context.. in my real data set, I have monthly data hence the year (02) and Month (Jan) from 02-Jan to 23-Jun. The 910 is the new indcode.. it represents total federal employment in a given area and time. Federal employment is defined as indcode = 000000 subtract indcode = 4911. the indcode = 55 is just to make it more realistic.
PS, I am having some difficulty with the "02-Jan" so feel free to rename it as Jan. Just trying to keep it in line with the real product.
indcode <- c("000000","000000","000000","000000", "55", "4911","4911","4911","4911")
ownership <- c("10","10","10","10","10","10","10","10","10")
area <- c("000000","031","029","017","029","000000","031","029","017")
"02-Jan" <- c(1000,600,300,100,50,100,50,40,10)
"02-Feb" <- c(1003,601,301,101,51,101,51,41,11)
first <- data.frame(indcode, ownership, area, `02-Jan`, `02-Feb`)
so for each area, here is an example. the actual 02 values would not be 1000-100 but rather 900 but I thought that this would make it clearer.
indcode ownership area 02-Jan 02-Feb
910 10 000000 1000-100 1003-101
910 10 031 600-50 601-51
library(dplyr)
first |>
summarize(across(3:4, ~max(.)-min(.)),
# OLD: summarize(across(3:4, ~paste(rev(range(.)), collapse = "-")),
.by = area) |>
#"3:4" refers to the 3rd and 4th column once we set aside the area grouping
# We could alternated specify the columns by name, e.g. X02.Jan:X02.Feb
mutate(indcode = 910, ownership = 10, .before = 1)
Result
indcode ownership area X02.Jan X02.Feb
1 910 10 000000 900 902
2 910 10 031 550 550
3 910 10 029 260 260
4 910 10 017 90 90