I am creating a simple cohort-based user retention model, based on the number of new users that appear each day, and the likelihood of a user reappearing on day 0 (100%), day 1, day 2, etc. I want to know the number of users active on each day. I am trying to vectorise this and getting in a right muddle. Here is a toy mockup.
rvec <- c(1, .8, .4); #retention for day 0, 1,2 (day 0 = 100%, and so forth)
newvec <- c(10, 10, 10); #new joiners for day 0, 1, 2 (might be different)
playernumbers <- matrix(0, nrow = 3, ncol = 3);
# I want to fill matrix playernumbers such that sum of each row gives
# the total playernumbers on day rownumber-1
# here is a brute force method (could be simplified via a loop or two)
# but what I am puzzled about is whether there is a way to fully vectorise it
playernumbers[1,1] <- rvec[1] * newvec[1];
playernumbers[2,1] <- rvec[2] * newvec[1];
playernumbers[3,1] <- rvec[3] * newvec[1];
playernumbers[2,2] <- rvec[1] * newvec[2];
playernumbers[3,2] <- rvec[2] * newvec[2];
playernumbers[3,3] <- rvec[1] * newvec[3];
playernumbers
I can't figure out how to vectorise this fully. I can see how I might do it columnwise, successsively using each column number to indicate (a) which rows to update (column number: nrows), and (b) which newvec index value to multiply by. But I'm not sure this is worth doing, as to me the loop is clearer. But is there a fully vectorised form am I missing? thanks!
If you don't insist on your weird indexing logic, you could simply calculate the outer product:
outer(rvec, newvec)
# [,1] [,2] [,3]
#[1,] 10 10 10
#[2,] 8 8 8
#[3,] 4 4 4
In the outer product the product of the second element of vector 1 and the second element of vector 2 is placed at [2,2]. You place it at [3,2]. Why?
Your result:
playernumbers
# [,1] [,2] [,3]
#[1,] 10 0 0
#[2,] 8 10 0
#[3,] 4 8 10
Edit:
This should do the same as your loop:
rvec <- c(1, .8, .4)
newvec <- c(10, 20, 30)
tmp <- outer(rvec, newvec)
tmp <- tmp[, ncol(tmp):1]
tmp[lower.tri(tmp)] <- 0
tmp <- tmp[, ncol(tmp):1]
res <- tmp*0
res[lower.tri(res, diag=TRUE)] <- tmp[tmp!=0]
# [,1] [,2] [,3]
#[1,] 10 0 0
#[2,] 8 20 0
#[3,] 4 16 30
rowSums(res)
#[1] 10 28 50