This is my starting df
test <- data.frame(year = c(2018,2018,2018,2018,2018),
source = c("file1", "file1", "file1", "file1", "file1"),
area = c("000", "000", "800", "800", "800"),
cult2 = c("PBGEX", "QPGEX", "PBGEX", "QPGEX", "QPIND"),
value = c(1000,2000,3000,4000,5000))
year source area cult2 value
1 2018 file1 000 PBGEX 1000
2 2018 file1 000 QPGEX 2000
3 2018 file1 800 PBGEX 3000
4 2018 file1 800 QPGEX 4000
5 2018 file1 800 QPIND 5000
I need to get for each year/source/area the sum of value, for the fields PBGEX and QPGEX.
I was thinking of using spread
and gather
but I m losing many others columns (not show here).
This what I would except :
year source area cult2 value
1 2018 file1 000 PBGEX 1000
2 2018 file1 000 QPGEX 2000
3 2018 file1 800 PBGEX 3000
4 2018 file1 800 QPGEX 4000
5 2018 file1 800 QPIND 5000
6 2018 file1 000 RDGEX 3000
7 2018 file1 800 RDGEX 7000
We can filter
the rows where the 'cult2' are 'QPGEX', 'PBGEX', then do a group_by
sum
and bind_rows
with original dataset
library(dplyr)
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
group_by(year, source, area) %>%
summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
bind_rows(test, .)
-output
# year source area cult2 value
#1 2018 file1 000 PBGEX 1000
#2 2018 file1 000 QPGEX 2000
#3 2018 file1 800 PBGEX 3000
#4 2018 file1 800 QPGEX 4000
#5 2018 file1 800 QPIND 5000
#6 2018 file1 000 RDGEX 3000
#7 2018 file1 800 RDGEX 7000
If we need a prop
ortion column
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
group_by(year, source, area) %>%
group_by(prop = value[cult2== "QPGEX"]/value[cult2 == "PBGEX"],
.add = TRUE) %>%
summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
bind_rows(test, .)
Or it can be also
library(tidyr)
test %>%
filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
pivot_wider(names_from = cult2, values_from = value) %>%
# or use spread
#spread(cult2, value) %>%
mutate(prop = QPGEX/PBGEX) %>%
select(-PBGEX, -QPGEX) %>%
right_join(test)
-output
# A tibble: 5 x 6
# year source area prop cult2 value
# <dbl> <chr> <chr> <dbl> <chr> <dbl>
#1 2018 file1 000 2 PBGEX 1000
#2 2018 file1 000 2 QPGEX 2000
#3 2018 file1 800 1.33 PBGEX 3000
#4 2018 file1 800 1.33 QPGEX 4000
#5 2018 file1 800 1.33 QPIND 5000