Using dplyr, I´m trying to sumproduct in R two columns as it can be done in Excel but although I´ve tried several aproaches, none worked properly.
A small piece of my dataframe (almost 61000 rows and 20 columns) looks like this:
df <-structure(list(dem_sect = structure(c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("AB",
"EP", "FE", "MF", "PA"), class = "factor"), cod_estr = c("a",
"a", "b", "b", "c", "b", "b", "b", "a", "b", "c", "b", "b", "a",
"b", "c", "c", "b", "c"), sect_estr = structure(c(10L, 10L, 11L,
11L, 12L, 11L, 11L, 11L, 10L, 11L, 12L, 11L, 11L, 10L, 11L, 12L,
12L, 11L, 12L), .Label = c("ABa", "ABb", "ABc", "EPa", "EPb",
"EPc", "FEa", "FEb", "FEc", "MFa", "MFb", "MFc", "PAa", "PAb",
"PAc"), class = "factor"), area = c(14L, 14L, 24L, 24L, 11L,
24L, 24L, 24L, 14L, 24L, 11L, 24L, 24L, 14L, 24L, 11L, 11L, 24L,
11L), kg_med = c(5.88125, 5.88125, 6.8505, 6.8505,
3.0852, 6.8505, 6.8505, 6.8505, 5.88125, 6.8505, 3.0852, 6.8505,
6.8505, 5.88125, 6.8505, 3.0852, 3.0852, 6.8505, 3.0852)), class = "data.frame", row.names = c(NA,
-19L))
I want to add a new column (called for example sumprod) which would be the sumproduct of the unique values of both "area" and "kg_med" grouped by "dem_sect"
The expect output will be a column with the value 280.69
(If I did the calculation correctly in Excel)
Any help will be more than welcome. Thanks in advance.
library(tidyverse)
df %>%
group_by(dem_sect) %>%
filter(!duplicated(area, kg_med)) %>%
summarise(sumproduct = sum(area * kg_med)) %>%
as.data.frame() # to view the result with more significant digits
#> dem_sect sumproduct
#> 1 MF 280.6867