I have a huge data set. Data covers around 4000 regions.
I need to do a multiplication like this: each number in each row should be multiplied by the corresponding column name/value (0 or...) at first. Then, these resulting numbers should be summed up and be divided by total number (totaln) in that row.
For example, the data is like this:
region totan 0 1 2 3 4 5 6 7 .....
1 1346 5 7 3 9 23 24 34 54 .....
2 1256 7 8 4 10 34 2 14 30 .....
3 1125 83 43 23 11 16 4 67 21 .....
4 3211 43 21 67 12 13 12 98 12 .....
5 1111 21 8 9 3 23 13 11 0 .....
.... .... .. .. .. .. .. .. .. .. .....
4000 2345 21 9 11 45 67 89 28 7 .....
The calculation should be like this:
For example in region 1: (5*0)+(7*1)+(3*2)+(9*3)+(23*4)+(24*5)+(34*6)+(7*54)...= the result/1346=the result
I need to do such an analysis for all the regions. I tried a couple of ways like use of "for" and "apply" but did not get the required result.
You could use the tidyverse :
library(tidyverse)
df %>% gather(k,v,-region,-totan) %>%
group_by(region,totan) %>% summarize(x=sum(as.numeric(k)*v)/first(totan))
## A tibble: 5 x 3
## Groups: region [?]
# region totan x
# <int> <int> <dbl>
#1 1 1346 0.620
#2 2 1256 0.387
#3 3 1125 0.671
#4 4 3211 0.304
#5 5 1111 0.232