Search code examples
rvectorizationretention

Can I vectorise/vectorize this simple cohort retention model in R?


I am creating a simple cohort-based user retention model, based on the number of new users that appear each day, and the likelihood of a user reappearing on day 0 (100%), day 1, day 2, etc. I want to know the number of users active on each day. I am trying to vectorise this and getting in a right muddle. Here is a toy mockup.

rvec <- c(1, .8, .4);   #retention for day 0, 1,2 (day 0 = 100%, and so forth)
newvec <- c(10, 10, 10); #new joiners for day 0, 1, 2  (might be different)
playernumbers <- matrix(0, nrow = 3, ncol = 3);

# I want to fill matrix playernumbers  such that sum of each row gives 
# the total playernumbers on day rownumber-1
# here is a brute force method  (could be simplified via a loop or two)
# but what I am puzzled about is whether there is a way to fully vectorise it    
playernumbers[1,1] <- rvec[1] * newvec[1];
playernumbers[2,1] <- rvec[2] * newvec[1];
playernumbers[3,1] <- rvec[3] * newvec[1];
playernumbers[2,2] <- rvec[1] * newvec[2];
playernumbers[3,2] <- rvec[2] * newvec[2];
playernumbers[3,3] <- rvec[1] * newvec[3];
playernumbers

I can't figure out how to vectorise this fully. I can see how I might do it columnwise, successsively using each column number to indicate (a) which rows to update (column number: nrows), and (b) which newvec index value to multiply by. But I'm not sure this is worth doing, as to me the loop is clearer. But is there a fully vectorised form am I missing? thanks!


Solution

  • If you don't insist on your weird indexing logic, you could simply calculate the outer product:

    outer(rvec, newvec)
    #     [,1] [,2] [,3]
    #[1,]   10   10   10
    #[2,]    8    8    8
    #[3,]    4    4    4
    

    In the outer product the product of the second element of vector 1 and the second element of vector 2 is placed at [2,2]. You place it at [3,2]. Why?

    Your result:

    playernumbers
    #     [,1] [,2] [,3]
    #[1,]   10    0    0
    #[2,]    8   10    0
    #[3,]    4    8   10
    

    Edit:

    This should do the same as your loop:

    rvec <- c(1, .8, .4)   
    newvec <- c(10, 20, 30)
    
    tmp <- outer(rvec, newvec)
    tmp <- tmp[, ncol(tmp):1]
    tmp[lower.tri(tmp)] <- 0
    tmp <- tmp[, ncol(tmp):1]
    res <- tmp*0
    res[lower.tri(res, diag=TRUE)] <- tmp[tmp!=0]
    #     [,1] [,2] [,3]
    #[1,]   10    0    0
    #[2,]    8   20    0
    #[3,]    4   16   30
    
    rowSums(res)
    #[1] 10 28 50