Search code examples
rsaslogistic-regression

How to get the fitted values with confidence bands when using PROC NLMIXED


I am analyzing a dataset that contains tree measurements such as the diameter (cm) and whether it is dead or alive (0/1). The measurement collection was irregular, i.e. starting in 1960 and is still being measured. A total of 10,000 plots have been established so far scattered across a large geographic region and all trees taller than 1.3m in height were permanentely tagged and monitored. That also means that if new trees grew in the plot and they passed 1.3m they were also included in the collection. Each plot consists on average of about 100 trees. Most of the plots were measured every 5-7 years or so, and some plots only twice, and some plots over 10 times, meaning I have repeated measurements of unique trees. Once a tree died, its final diameter was recored and then excluded from further measurements.

Here is some structure of a subset of the data:

> glimpse(d)
Rows: 15,472
Columns: 12
$ plot_number          <dbl> 272.1, 272.1, 272.1, 272.1, 272.1, 272.1, 272.1, 27…
$ establishment_year   <dbl> 1965, 1965, 1965, 1965, 1965, 1965, 1965, 1965, 196…
$ measurement_year     <dbl> 1977, 1977, 1977, 1977, 1977, 1977, 1977, 1977, 197…
$ measurement_interval <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,…
$ tree_number          <dbl> 1, 5, 6, 7, 12, 15, 18, 19, 20, 22, 24, 26, 27, 29,…
$ species              <chr> "Pl", "Pl", "Pl", "Pl", "Pl", "Pl", "Pl", "Pl", "Pl…
$ dbh                  <dbl> 27.9, 30.2, 13.0, 14.2, 10.9, 19.8, 9.1, 36.8, 24.4…
$ survival             <dbl> 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ tree_id              <chr> "272.1-1", "272.1-5", "272.1-6", "272.1-7", "272.1-…

> d %>% 
+   group_by(plot_number, measurement_year) %>% 
+   summarise(number_of_trees = n())
`summarise()` has grouped output by 'plot_number'. You can override using
the `.groups` argument.
# A tibble: 210 × 3
# Groups:   plot_number [72]
           plot_number measurement_year number_of_trees
                 <dbl>            <dbl>           <int>
 1                 6.1             1965              33
 2                 6.1             1968              37
 3                 6.1             1973              34
 4                 6.1             1982              23
 5                 6.1             1992              11
 6                 6.2             1965              35
 7                 6.2             1968              34
 8                 6.2             1973              34
 9                 6.2             1982              28
10                 6.2             1992              20
# ℹ 200 more rows
# ℹ Use `print(n = ...)` to see more rows

> d %>% 
+   group_by(plot_number, tree_number) %>% 
+   summarise(tree_measurements = n()) %>% 
+   arrange(plot_number, tree_number) 
`summarise()` has grouped output by 'plot_number'. You can override using
the `.groups` argument.
# A tibble: 6,725 × 3
# Groups:   plot_number [72]
           plot_number tree_number tree_measurements
                 <dbl>       <dbl>             <int>
 1                 6.1         531                 4
 2                 6.1         532                 3
 3                 6.1         533                 5
 4                 6.1         534                 3
 5                 6.1         536                 3
 6                 6.1         537                 4
 7                 6.1         540                 2
 8                 6.1         542                 4
 9                 6.1         543                 2
10                 6.1         546                 3
# ℹ 6,715 more rows
# ℹ Use `print(n = ...)` to see more rows

I want to use these data to generate a survival model by species, i.e. survival as a function of diameter (fitted values with confidence bands).

In R, I did it like this:

require(glmmTMB)
require(ggeffects)
m <- glmmTMB(survival ~ dbh + (1|tree_id) , 
             family = binomial(link = "logit"), 
             data = d)

m_pred <- ggpredict(m, terms = "dbh [all]")

plot(m_pred) 

enter image description here

However, the predicted values need to be adjusted for the time between subsequent measurements of the same tree. In the literature, the following generalized logistic model is recommended for interval lengths of 5-7 years (A Generalized Mixed Logistic Model for Predicting Individual Tree Survival Probability with Unequal Measurement Intervals):

$$P = \left( \frac{e^\eta}{1 + e^\eta} \right)^{\Delta t}$$

This is where I had to move over to SAS proc nlmixed because I couldn't figure out a way to code this in R (although there is a @BenBolker hack here that seems to work for my data). Anyway, I want to get this running in SAS as well and then compare with Ben Bolker's code. This is the SAS code I use:

proc nlmixed data=d;
    parms b0=1 b1=0.01 ss=1; 
    eta = (b0+u1)+b1*dbh ; 
    prob = (exp(eta)/(1+exp(eta)))**measurement_interval; /*ADJUSTED LOGISTIC FUNCTION*/
    model survival ~ binary(prob);  
     random u1 ~ normal(0,ss*ss) subject= tree_id;
      predict prob out=predv;
      predict u1 out=ran;
      
run;

What I cannot figure out right now is how to get the same plot as above (fitted values with confidence bands)... I did use the predict option in SAS but the predicted values seem to be the predicted values for the individual observations and not the fitted values. While, I can create the fitted line (as above) using the estimated model parameters I don't know how to also add the confidence bands for those fitted values.

On this website (https://stats.oarc.ucla.edu/sas/faq/how-can-i-run-simple-linear-and-nonlinear-models-using-nlmixed/) is a link to a SAS toy dataset for PROC NLMIXED. Using this dataset, I ran the following code which resulted in this graph (which I don't want):

proc nlmixed data="C:\hsbdemo.sas7bdat";
  parms b0=5 b1=0 ss=1;
  xb=(b0+u1)+b1*read;
  prob = exp(xb)/(1+exp(xb));
  model honors ~ binary(prob);
  random u1 ~ normal(0,ss*ss) subject = ses;
  predict prob out=pred;
run;

proc sgplot data=pred;
    series x=read y=pred / lineattrs=(color=blue);
    band x=read lower=lower upper=upper / transparency=0.5;
    xaxis label="Read";
    yaxis label="Predicted Probability";
run;

enter image description here

So my question is how can I replicate the plot generated in R (first plot above) using SAS and PROC NLMIXED?

EDIT

As per @Tom 's suggestion, here is a way that will calculate the required confidence intervals. The following example is in R but I would like it to work in SAS for a model run in PROC NLMIXED. I added comments that explain the steps hopefully clearly enough that someone who doesn't know R but knows SAS can replicate it. However, ideally I would like a SAS procedure that is programmed to do it instead of doing it manually.

require(haven)
require(ggeffects)
require(glmmTMB)

# read data (https://stats.oarc.ucla.edu/sas/faq/how-can-i-run-simple-linear-and-nonlinear-models-using-nlmixed/)
d <- read_sas("hsbdemo.sas7bdat")

# run logistic regression
m <- glmmTMB(HONORS ~ READ + (1|SES), data = d, family = binomial(link = "logit"))

# extract fixed effects parameter estimates
coef <- fixef(m)$cond

# create data frame with sequence of predictions
new_preds <- data.frame(INTERCEPT = 1, READ = seq(min(d$READ),max(d$READ), by = 0.1))

# create design matrix
X <- as.matrix(new_preds)

# matrix multiply design matrix with parameter vector to get predictions
eta <- X %*% coef

# extract variance-covariance matrix
vcov_model <- vcov(m)$cond

# matrix multiply design matrix with variance covariance matrix with the transpose of the design matrix 
vcov_preds <- diag(X %*% vcov_model %*% t(X))

# calculate standard deviations (errors)
se_logit <- sqrt(vcov_preds)

# transform prediction from logit to probability scale
prob <- plogis(eta)

# calculate confidence intervals and transform from logit to probability scale
lower <- plogis(eta - 1.96 * se_logit)
upper <- plogis(eta + 1.96 * se_logit)

# plot results
plot(new_preds$READ, prob)
lines(new_preds$READ, lower, col = "red")
lines(new_preds$READ, upper, col = "red")

And this is how it should look like (not like the one above).

enter image description here

Can this plot be reproduced for a model fit in PROC NLMIXED?


Solution

  • I came across your post while searching for a way to add confidence interval bands on regression lines using nlmixed. If you sort the prediction output by read(x variable), you will get reasonable but jagged line and bands because of random effects.

    proc sort data=pred_n;
    by read;
    run;
    

    Plot after sorting data by variable x The post at https://agstats.io/tutorials/sas-nonlinear (example under 8 Mixed model estimation) showed an example of adjusting the prediction function for the random effects, without the random parameter in the predict statement can get something like this:

    predict  exp(b0+b1*(read))/(1+exp(b0+b1*(read))) out=pred;
    

    without random parameter in the prediction function

    I am new to nlmixed, maybe some advanced users can post the correct model setting for this purpose?