Search code examples
rbayesianmcmcmultivariate-testing

How to properly code a scaled inverse Wishart prior for a MCMCglmm model?


I am running a multivariate model (4 response variables) with two random effects using MCMCglmm(). I am currently using a inverse Wishart prior.

###current set up for prior
prior.miw<-list(R=list(V=diag(4), nu=0.002), 
                G=list(G1=list(V=diag(4), 
                nu=0.002,   
                alpha.mu=c(0,0,0,0), 
                alpha.V=diag(4)*1000),
                G2=list(V=diag(4), #need to repeat to deal with second random effect
                nu=0.002,   
                alpha.mu=c(0,0,0,0), 
                alpha.V=diag(4)*1000)))

Based on my model, it appears I should really run a scaled inverse Wishart prior (see page 14 of Lemoine 2019).

enter image description here

My question: how do I make this into a prior that is suitable for my MCMCglmm() model?

Additionally, if this is NOT the sort of prior I should be running, please feel free to weigh in.


Solution

  • This is a two-part question:

    • what priors should I use for a multivariate random effect where the likelihood is concentrated at small values? (I am assuming that this is the reason you are looking for an alternative to the default inverse Wishart priors) [more suitable for CrossValidated]
    • which of these are available in MCMCglmm, and how do I implement them there? [good for Stack Overflow]

    The general trick is to decompose the covariance matrix into a multivariate component (the correlation matrix or inverse correlation matrix or something) and a vector of scaling parameters for the standard deviations (or inverse standard deviations); Lemoine suggests U(0,100) for the scaling priors, which I think is bad (why flat? I can't get to the precise page of Gelman and Hill 2007 where they discuss which distribution to use for scaling priors ... but I would be a little surprised if they actually recommended a uniform distribution on the variance scale ...)

    update having actually looked at your code (!): I think you're doing the right thing, except that nu=0.002 seems really extreme; see end for that discussion.

    This is basically what MCMCglmm does, but it uses a different (IMO better) choice for the scaling priors. It sounds scary:

    These priors are all from the non-central scaled F-distribution, which implies the prior for the standard deviation is a non-central folded scaled t-distribution (Gelman, 2006).

    but it boils down to choosing four parameters, only two of which you really have to think about.

    • V: the prior mean variance (or the prior mean covariance matrix, if you have a multivariate random effect term). According to the course notes, "without loss of generality V can be set to one" (or in the case of a multivariate model, to an identity matrix)
    • alpha.mu: we almost always want this to be zero (or as in your example, a vector of zeros); that way the prior for the standard deviation will be a Student t distribution. (There may be a use case for alpha.mu != 0, but I've never run across it.)
    • alpha.V: with V set to 1 (or an identity matrix), this is the prior mean of the covariance matrix. A diagonal matrix with a reasonable scale for your problem is a good choice
    • nu: the shape parameter; as nu → ∞ we get a half-Normal prior for the standard deviations, with nu=1 we get a Cauchy distribution. Smaller values have fatter tails (less conservative/allowing broader samples, but also giving more danger of weird sampling behaviour in the tails).

    For the univariate case Hadfield says the t prior with V=1 is

    2 * dt(sqrt(v)/sqrt(alpha.V), df = nu, 
           ncp = alpha.mu/sqrt(alpha.V))
    

    with v restricted to be zero.

    • If we set alpha.mu=0 like sensible people the ncp (non-centrality parameter) argument is zero;
    • the 2 out front doubles the density because we are only looking at the positive half of the distribution ("folded")
    • the division by sqrt(alpha.V) means we scale the t-distribution by alpha.V
    • in other words, we would get a sample from the prior SD distribution via sqrt(alpha.V)*abs(rt(1, nu=nu))

    I wish Hadfield had given an example of parameter expansion in a multivariate inverse-Wishart case. Where did you find yours?


    I think nu=0.002 is a dangerous choice. Why?

    It's tempting to make your priors "as flat as possible", but this has two potentially bad consequences.

    • for some cases, such as the inverse-Gamma or Gamma distributions, a small shape parameter will also generate a big spike near zero; this is the case for your residual prior above R=list(V=diag(4), nu=0.002). This is OK if your data are informative (reasonable guess for the residual variance), but will sample badly if they're not.
    • for the parameter-expanded case we don't have the spike-at-zero problem (the half-Normal, half-t, and half-Cauchy are flat at zero), but by making the tail so fat, you are potentially allowing the estimator to sample truly ridiculous (and computationally problematic) values. (Again, not a problem if your data are highly informative.) Consider that a t distribution with nu=1 corresponds to a Cauchy distribution, whose tails are so fat that its mean is infinite, then imagine making the shape parameter 500 times smaller (nu=0.002) ...