Implementing exponential General Linear Model in Stan/RStan

I have the following model:

I wish to learn how to implement this model.

Research and Attempt

I’ve had a look at the following Poisson GLM as an example:

```{stan output.var="PoissonGLMQR"}
  data{
    int<lower=1> n;     // number of observations
    int<lower=1> p;     // number of predictors
    matrix[n, p] x;     // design matrix
    int<lower=0> y[n];  // responses
  }
  
  transformed data{
    matrix [n, p] Q_ast;
    matrix [p, p] R_ast;
    matrix [p, p] R_ast_inverse;
    Q_ast = qr_Q(x)[,1:p] * sqrt(n - 1.0);
    R_ast = qr_R(x)[1:p,] / sqrt(n - 1.0);
    R_ast_inverse = inverse(R_ast);
  }
  
  parameters{
    vector[p] theta;  // regression coefficients for predictors
  }
  
  transformed parameters{
    vector[p] beta;
    vector[n] mu;
    mu = exp(Q_ast*theta);
    beta = R_ast_inverse * theta;
  }
  
  model{
     y  ~ poisson(mu);
  }
```

I understand that this example has used QR reparameterization (see section 9.2 of the Stan reference manual). However, since I’m new to Stan, I’m finding this quite intimidating, and I’m unsure of how to implement the exponential GLM in the same way. Does one even need to use the QR reparameterization for the exponential case?

Anyway, my naive attempt at the exponential GLM is the following adaptation of the Poisson GLM code:

```{stan output.var="ExponentialGLM"}
  data{
    int<lower=1> n;     // number of observations
    int<lower=1> p;     // number of predictors
    matrix[n, p] x;     // design matrix
    real<lower=0> y[n];  // responses
  }
  
  transformed data{
    matrix [n, p] Q_ast;
    matrix [p, p] R_ast;
    matrix [p, p] R_ast_inverse;
    Q_ast = qr_Q(x)[,1:p] * sqrt(n - 1.0);
    R_ast = qr_R(x)[1:p,] / sqrt(n - 1.0);
    R_ast_inverse = inverse(R_ast);
  }
  
  parameters{
    vector[p] theta;  // regression coefficients for predictors
  }
  
  transformed parameters{
    vector[p] beta;
    vector[n] mu;
    mu = exp(Q_ast*theta);
    beta = (R_ast_inverse * theta);
  }
  
  model{
     y  ~ exponential(mu);
  }
```

But, I’m totally unsure if this is even the right way to go about coding such a model in Stan. All I did was simply try to adapt the Poisson GLM example to the aforementioned exponential GLM.

I am seeking a demonstration of the exponential GLM and clarifications regarding my misunderstandings above.

Sample Data

    conc  time
 1   6.1   0.8
 2   4.2   3.5
 3   0.5  12.4
 4   8.8   1.1
 5   1.5   8.9
 6   9.2   2.4
 7   8.5   0.1
 8   8.7   0.4
 9   6.7   3.5
10   6.5   8.3
11   6.3   2.6
12   6.7   1.5
13   0.2  16.6
14   8.7   0.1
15   7.5   1.3

Solution

How about something like this?

We start by reading in the sample data.

df <- read.table(text =
    "    conc  time
 1   6.1   0.8
 2   4.2   3.5
 3   0.5  12.4
 4   8.8   1.1
 5   1.5   8.9
 6   9.2   2.4
 7   8.5   0.1
 8   8.7   0.4
 9   6.7   3.5
10   6.5   8.3
11   6.3   2.6
12   6.7   1.5
13   0.2  16.6
14   8.7   0.1
15   7.5   1.3", header = T);

We define the Stan model, as a simple Gamma (Exponential) model with a log-link.

model <- "
data {
    int N;                                       // Number of observations
    int K;                                       // Number of parameters
    real y[N];                                   // Response
    matrix[N,K] X;                               // Model matrix
}

parameters {
    vector[K] beta;                              // Model parameters
}

transformed parameters {
    vector[N] lambda;
    lambda = exp(-X * beta);                     // log-link
}

model {
    // Priors
    beta[1] ~ cauchy(0, 10);
    for (i in 2:K)
        beta[i] ~ cauchy(0, 2.5);

    y ~ exponential(lambda);
}
"

Here I assume (standard) weakly informative Cauchy priors on the beta model parameters.

We now fit the model to the data.

library(rstan);
options(mc.cores = parallel::detectCores());
rstan_options(auto_write = TRUE);

X <- model.matrix(time ~ conc, data = df);
model.stan <- stan(
    model_code = model,
    data = list(
        N = nrow(df),
        K = ncol(X),
        y = df$time,
        X = X),
    iter = 4000);

To compare point estimates, we also fit a Gamma GLM to the data using glm.

model.glm <- glm(time ~ conc, data = df, family = Gamma(link = "log"));

The Stan model parameter estimates are

summary(model.stan, "beta")$summary;
#              mean     se_mean         sd       2.5%        25%        50%
#beta[1]  2.9371457 0.016460000 0.62017934  1.8671652  2.5000356  2.8871936
#beta[2] -0.3099799 0.002420744 0.09252423 -0.5045937 -0.3708111 -0.3046333
#               75%      97.5%    n_eff     Rhat
#beta[1]  3.3193334  4.3078478 1419.629 1.002440
#beta[2] -0.2461567 -0.1412095 1460.876 1.002236

The GLM model parameter estimates are

summary(model.glm)$coef;
#              Estimate Std. Error   t value     Pr(>|t|)
#(Intercept)  2.8211487 0.54432115  5.182875 0.0001762685
#conc        -0.3013355 0.08137986 -3.702827 0.0026556952

We see good agreement between the Stan point estimates for the means and Gamma-GLM parameter estimates.

We plot parameter estimates from both the Stan and glm models.

library(tidyverse);
as.data.frame(summary(model.stan, "beta")$summary) %>%
    rownames_to_column("Parameter") %>%
    ggplot(aes(x = `50%`, y = Parameter)) +
    geom_point(size = 3) +
    geom_segment(aes(x = `2.5%`, xend = `97.5%`, yend = Parameter), size = 1) +
    geom_segment(aes(x = `25%`, xend = `75%`, yend = Parameter), size = 2) +
    labs(x = "Median (plus/minus 95% and 50% CIs)") +
    geom_point(
        data = data.frame(summary(model.glm)$coef, row.names = c("beta[1]", "beta[2]")) %>%
            rownames_to_column("Parameter"),
        aes(x = Estimate, y = Parameter),
        colour = "red", size = 4, shape = 1)

glm estimates are shown in red.

Update (Fit Stan model using QR re-parametrisation)

Stan model with QR re-parametrisation.

model.QR <- "
data {
    int N;                                       // Number of observations
    int K;                                       // Number of parameters
    real y[N];                                   // Response
    matrix[N,K] X;                               // Model matrix
}

transformed data {
    matrix[N, K] Q;
    matrix[K, K] R;
    matrix[K, K] R_inverse;
    Q = qr_Q(X)[, 1:K];
    R = qr_R(X)[1:K, ];
    R_inverse = inverse(R);
}

parameters {
    vector[K] theta;
}

transformed parameters {
    vector[N] lambda;
    lambda = exp(-Q * theta);                     // log-link
}

model {
    y ~ exponential(lambda);
}

generated quantities {
    vector[K] beta;                              // Model parameters
    beta = R_inverse * theta;
}
"

In the QR decomposition, we have lambda = exp(- X * beta) = exp(- Q * R * beta) = exp(- Q * theta), where theta = R * beta and therefore beta = R^-1 * theta.

Note that we do not specify any priors on theta; in Stan this defaults to flat (i.e. uniform) priors.

Fit Stan model to the data.

library(rstan);
options(mc.cores = parallel::detectCores());
rstan_options(auto_write = TRUE);

X <- model.matrix(time ~ conc, data = df);
model.stan.QR <- stan(
    model_code = model.QR,
    data = list(
        N = nrow(df),
        K = ncol(X),
        y = df$time,
        X = X),
    iter = 4000);

Parameter estimates
```
summary(model.stan.QR, "beta")$summary;
#              mean     se_mean         sd       2.5%        25%        50%
#beta[1]  2.9637547 0.009129250 0.64383609  1.8396681  2.5174800  2.9194682
#beta[2] -0.3134252 0.001321584 0.09477156 -0.5126905 -0.3740475 -0.3093937
#               75%      97.5%    n_eff      Rhat
#beta[1]  3.3498585  4.3593912 4973.710 0.9998545
#beta[2] -0.2478029 -0.1395686 5142.408 1.0003236
```
You can see that parameter estimates between the two Stan models (with and without QR re-parametrisation) agree very well.

If you were to ask me whether QR re-parametrisation makes a (big/any) difference, I'd say "probably not in this case"; Andrew Gelman and others have often emphasised that using even very weakly informative priors will help with convergence and should be preferred over flat (uniform) priors; I would always try to use weakly informative priors on all parameters, and start with a model without QR re-parametrisation; if convergence is poor, I would then try to optimise my model in a next step.