Search code examples
rsasprocnlsnonlinear-functions

PROC NLIN from SAS to R


I have been currently assigned a work where I need to translate a SAS code to R. I have been able to successfully do 80% of it and now I am stuck at the part where PROC NLIN is used. From what I read, PROC NLIN is used to fit Non-Linear Models andI am not sure if the code is actually is doing that and hence, stuck at how to do it in R. The code is as follows -

proc nlin data=ds1 outest=estout;
 parms ET= 0 f= 10.68;
  E= f- R*(1-ET*M); 
  L   = E*E;
  model.like = sqrt(E*E);
  by Name ; 
run;

The sample data is as follows -

Name    M           R
Anna    0.5456231   4.118197
Anna    0.5359164   4.240243
Anna    0.541881    3.943975
Anna    0.5436047   3.822222
Anna    0.5522962   3.58813
Anna    0.5561487   3.513195
Anna    0.5423374   3.666507
Anna    0.525836    3.715371
Anna    0.5209941   3.805572
Anna    0.5304675   3.750689
Anna    0.5232541   3.788292

When I went through the pages on PROC NLIN in SAS help, the argument 'MODEL' is used to specify the equation but the code here doesn't have a model equation. Model.like is to specify the likelihood function (page 4316 - https://support.sas.com/documentation/cdl/en/statugnlin/61811/PDF/default/statugnlin.pdf) So what is this code doing? I am totally confused. I, initially felt that this can be done in R using nls() and I tried the following -

fit = nls(E~ f - R*(1-eta*M),sample, start=list(eta=0,phi=10.86)
      ,trace=T)

But I quickly realised this is wrong because the model didn't converge even after 5000 iterations. This is because, I don't have a column 'E' in my dataset. So, how is SAS doing it? Any help is appreciated!


Solution

  • First let's figure out what the SAS code is doing. PROC NLIN can be tricked into doing various minimization problems, but the setup is sometimes counter-intuitive. You need to define a dependent variable ($y$) and a predicted value based on the other variables and some parameters ($f(x, \beta$), and it will minimize $\sum_i [y_i - f(x_i,\beta)]^2$.

    The key line in defining $y$ and $f$ is

    model.like = sqrt(E*E)
    

    which is equivalent to

    model like = sqrt(E*E)
    

    So it means that the $\sum [like - \sqrt{E\cdotE}]^2$ will be minimized. Based on the example you linked I would assume that the variable like was defined earlier and it has been set to a constant 0. This means that $\sum [0- \sqrt{E\cdotE}]^2 = \sum E^2$ is being minimized.

    E was defined as f- R*(1-ET*M), so in fact $\sum [f- R*(1-ET*M)]^2$ is minimized, where f and ET are unknown parameters. I am not sure what the meaning of this, but that's what happening.

    Rewriting it to R indeed can use nls, and we could use the same trick: predicting zero.

    sample <- read.table(textConnection(
    "Name    M           R
     Anna    0.5456231   4.118197
     Anna    0.5359164   4.240243
     Anna    0.541881    3.943975
     Anna    0.5436047   3.822222
     Anna    0.5522962   3.58813
     Anna    0.5561487   3.513195
     Anna    0.5423374   3.666507
     Anna    0.525836    3.715371
     Anna    0.5209941   3.805572
     Anna    0.5304675   3.750689
     Anna    0.5232541   3.788292"), header=TRUE)
    
    nls(0 ~ f - R*(1-eta*M), data=sample, start=list(eta=0,f=10.86), trace=T)
    

    With output

    546.5988 :   0.00 10.86
    0.06273518 :  1.7259120 0.2731282
    Nonlinear regression model
      model: 0 ~ f - R * (1 - eta * M)
       data: sample
       eta      f 
    1.7259 0.2731 
     residual sum-of-squares: 0.06274
    
    Number of iterations to convergence: 1 
    Achieved convergence tolerance: 4.345e-07
    

    Note that the SAS code is run by Name, so you will have to make sure that the R code fits a different model for each name as well.