Search code examples
arraysrmatrixmodelnls

R: epidemic diffusion model using two-dimensional non-linear least squares (nls)


I'm trying to implement an epidemic diffusion model in R. The formula for the diffusion is delta_y=(a+b * y) * (N-y). y describes the current users in period t, N describes the number of potential users, delta_y describes the new users in t and a as well as b are parameters to be estimated. Note that y is the cumulative sum of all previous delta_y. For a single observation (with delta_y and y as vectors) the model simply works with:

model1 <- nls(delta_y ~ (a+b * y) * (N-y))

Now the problem is that I have a set of observations of this type and I want to estimate the same parameters a and b for all of them. I was trying to use the same formula from above but now delta_y and y are two-dimensional arrays instead of vectors. I receive the error in "qr.qty(QR, resid) : 'qr' and 'y' must have the same number of rows"

Details about the data: y as well as delta_y are two-dimensional arrays with 16 columns and 20103 rows. The arrays are created as following:

y=matrix(c(data$nearby_1998,data$nearby_1999, data$nearby_2000, ..., data$nearby_2013),nrow=20103)
invCum <- function (data) {result=matrix(nrow=nrow(data), ncol=ncol(data)); result[,1]=data[,1]; for (i in 2:ncol(data)) {result[,i] <- data[,i]-data[,i-1]}; return(result)}
delta_y <- invCum(y)

invCum is a function that returns the new users in t given the cumulated users in t (practically an inverse cumsum function).

str(y) delivers "int [1:20103, 1:16] 0 0 0 0 0 0 0 0 0 0 ...". str(delta_y) also delivers "int [1:20103, 1:16] 0 0 0 0 0 0 0 0 0 0 ...". Note that not all entries are 0, just many of the first entries.

The columns of the data each have 20103 entries. The model above worked for a single row of the data.


Solution

  • After searching the Rhelp archives for that error and finding that a similar situation was solved by Duncan Murdoch by converting the matrices to "long"-form using as.vector() and reviewing the material on nls and nlsList in Pinheiro and Bates, I'm posting the results of some experiments that may be congruent with your data situation. If I understand correctly you have 16 different "runs" of observations of delta_y and y and your hope was to model them with the same non-linear model. What's not clear yet is whether you expect them all: (A) to have the same parameters or instead (B) expect coefficients to vary with merely the same form. Let's take the (A) case first, which is what the as.vector()-solution provided by Duncan Murdoch nine years ago would deliver.

    newdf <- data.frame( d_y <- as.vector(delta_y), 
                         y = as.vector(y), 
                         grp=rep(letters[1:16], each=20103) )
    N=   _____  # you need to add this; not sure if it's a constant or vector
                # if it varies across groups need to use the rep()-strategy to add to newdf
    model1 <- nls( d_y ~ (a+b * y) * (N-y)  , data=newdf, start=list(a=0, b=1))
    

    If on the other hand you want separate coefficients:

     library(nlme)
     model1 <- nlsList(delta_y ~ (a+b * y) * (N-y) | grp, data=newdf, start=c(a=0, b=1) )
    

    Here is some testing: First on a single group (the example in ?nls):

    DNase1 <- subset(DNase, Run == 1) 
    > fm2DNase1 <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
    +                  data = DNase1,
    +                  start = list(xmid = 0, scal = 1) )
    > summary(fm2DNase1)
    ==================
    Formula: density ~ 1/(1 + exp((xmid - log(conc))/scal))
    
    Parameters:
         Estimate Std. Error t value Pr(>|t|)
    xmid -0.02883    0.30785  -0.094    0.927
    scal  0.45640    0.27143   1.681    0.115
    
    Residual standard error: 0.3158 on 14 degrees of freedom
    
    Number of iterations to convergence: 14 
    Achieved convergence tolerance: 1.631e-06
    

    Now done on all groups of that dataset without regard to group-ID:

    > fm2DNase <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
    +                  data = DNase,
    +                  start = list(xmid = 0, scal = 1) )
    > summary(fm2DNase)
    ==========
    Formula: density ~ 1/(1 + exp((xmid - log(conc))/scal))
    
    Parameters:
         Estimate Std. Error t value Pr(>|t|)    
    xmid -0.14816    0.09780  -1.515    0.132    
    scal  0.46736    0.08691   5.377 2.41e-07 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.3291 on 174 degrees of freedom
    
    Number of iterations to convergence: 13 
    Achieved convergence tolerance: 7.341e-06
    

    And finally on each group separately with the equation form remaining the only constant:

    > fm2DNase <- nlsList(density ~ 1/(1 + exp((xmid - log(conc))/scal))|Run,
    +                  data = DNase,
    +                  start = list(xmid = 0, scal = 1) )
    > summary(fm2DNase)
    Call:
      Model: density ~ 1/(1 + exp((xmid - log(conc))/scal)) | Run 
       Data: DNase 
    
    Coefficients:
       xmid 
          Estimate Std. Error     t value  Pr(>|t|)
    10 -0.23467586  0.3527077 -0.66535499 0.4749505
    11 -0.18717815  0.3522418 -0.53139112 0.5746396
    9  -0.14742434  0.3459987 -0.42608348 0.6521089
    1  -0.02882911  0.3403312 -0.08470898 0.9267180
    4  -0.01243939  0.3351487 -0.03711604 0.9691708
    8  -0.09549007  0.3408348 -0.28016525 0.7741478
    5  -0.09216741  0.3367420 -0.27370331 0.7800695
    7  -0.25657193  0.3613815 -0.70997535 0.4750054
    6  -0.25052019  0.3564816 -0.70275765 0.5051072
    2  -0.11218699  0.3245483 -0.34567120 0.7763199
    3  -0.23007674  0.3433663 -0.67006203 0.5933597
       scal 
        Estimate Std. Error  t value  Pr(>|t|)
    10 0.4904888  0.3148254 1.557971 0.1076081
    11 0.4892928  0.3138277 1.559113 0.1139307
    9  0.4723505  0.3075025 1.536087 0.1189793
    1  0.4564003  0.3000630 1.521015 0.1148339
    4  0.4423467  0.2946883 1.501066 0.1338825
    8  0.4582587  0.3018498 1.518168 0.1352101
    5  0.4473772  0.2980249 1.501140 0.1407799
    7  0.5142468  0.3234251 1.590003 0.1224310
    6  0.5007426  0.3185856 1.571768 0.1483103
    2  0.4161636  0.2878193 1.445920 0.2457047
    3  0.4654567  0.3062277 1.519969 0.2355130
    
    Residual standard error: 0.3491304 on 154 degrees of freedom