Am working on a large dataset to calculate a single value in R. I believe the CUMSUM and cum product would work. But I don't know-how
county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
I need a function that can simply give me a single value as follows for every county_id, then I need the total. Example, for county_id=1 the total for res is calculated manually as
2(3+2+4)+3(2+4)+2(4)
for county_id=2 the total for res is calculated manually as
2(4+3)+4(3)
for county_id=3 the total for res is calculated manually as
3(2)
Then it sums all this into a single variable
44+26+6=76
NB my county_id run from 1:47 and each county_id could have up to 200 res
Thank you
Here is one way to do this using tidyverse
functions.
For each county_id
we multiply the current res
value with the sum
of res
value after it.
library(dplyr)
library(purrr)
df1 <- df %>%
group_by(county_id) %>%
summarise(result = sum(map_dbl(row_number(),
~res[.x] * sum(res[(.x + 1):n()])), na.rm = TRUE))
df1
# county_id result
# <dbl> <dbl>
#1 1 44
#2 2 26
#3 3 6
To get total sum
you can then do :
sum(df1$result)
#[1] 76
data
county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
df <- data.frame(county_id, res)