Search code examples
rcumsum

cumsum and product based on Unique ID


Am working on a large dataset to calculate a single value in R. I believe the CUMSUM and cum product would work. But I don't know-how

county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)

I need a function that can simply give me a single value as follows for every county_id, then I need the total. Example, for county_id=1 the total for res is calculated manually as

2(3+2+4)+3(2+4)+2(4)

for county_id=2 the total for res is calculated manually as

2(4+3)+4(3)

for county_id=3 the total for res is calculated manually as

3(2)

Then it sums all this into a single variable

44+26+6=76

NB my county_id run from 1:47 and each county_id could have up to 200 res

Thank you


Solution

  • Here is one way to do this using tidyverse functions.

    For each county_id we multiply the current res value with the sum of res value after it.

    library(dplyr)
    library(purrr)
    
    df1 <- df %>%
             group_by(county_id) %>%
             summarise(result = sum(map_dbl(row_number(), 
                               ~res[.x] * sum(res[(.x + 1):n()])), na.rm = TRUE))
    
    df1
    #  county_id result
    #      <dbl>  <dbl>
    #1         1     44
    #2         2     26
    #3         3      6
    

    To get total sum you can then do :

    sum(df1$result)
    #[1] 76
    

    data

    county_id <- c(1,1,1,1,2,2,2,3,3)
    res <- c(2,3,2,4,2,4,3,3,2)
    df <- data.frame(county_id, res)