Search code examples
rdplyrsumproduct

Sumproduct in R using dplyr


Using dplyr, I´m trying to sumproduct in R two columns as it can be done in Excel but although I´ve tried several aproaches, none worked properly.

A small piece of my dataframe (almost 61000 rows and 20 columns) looks like this:

df <-structure(list(dem_sect = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("AB", 
"EP", "FE", "MF", "PA"), class = "factor"), cod_estr = c("a", 
"a", "b", "b", "c", "b", "b", "b", "a", "b", "c", "b", "b", "a", 
"b", "c", "c", "b", "c"), sect_estr = structure(c(10L, 10L, 11L, 
11L, 12L, 11L, 11L, 11L, 10L, 11L, 12L, 11L, 11L, 10L, 11L, 12L, 
12L, 11L, 12L), .Label = c("ABa", "ABb", "ABc", "EPa", "EPb", 
"EPc", "FEa", "FEb", "FEc", "MFa", "MFb", "MFc", "PAa", "PAb", 
"PAc"), class = "factor"), area = c(14L, 14L, 24L, 24L, 11L, 
24L, 24L, 24L, 14L, 24L, 11L, 24L, 24L, 14L, 24L, 11L, 11L, 24L, 
11L), kg_med = c(5.88125, 5.88125, 6.8505, 6.8505, 
3.0852, 6.8505, 6.8505, 6.8505, 5.88125, 6.8505, 3.0852, 6.8505, 
6.8505, 5.88125, 6.8505, 3.0852, 3.0852, 6.8505, 3.0852)), class = "data.frame", row.names = c(NA, 
-19L))

I want to add a new column (called for example sumprod) which would be the sumproduct of the unique values of both "area" and "kg_med" grouped by "dem_sect"

The expect output will be a column with the value 280.69 (If I did the calculation correctly in Excel)

Any help will be more than welcome. Thanks in advance.


Solution

  • library(tidyverse)
    
    df %>% 
      group_by(dem_sect) %>% 
      filter(!duplicated(area, kg_med)) %>% 
      summarise(sumproduct = sum(area * kg_med)) %>% 
      as.data.frame() # to view the result with more significant digits
    
    #>   dem_sect sumproduct
    #> 1       MF   280.6867