I'm trying to estimate the mean and covariance matrix of a multivariate normal distribution with STAN. I first import pystan and generate the data. I basically try to follow the official YouTube tutorial for Python.
import pystan as ps
import numpy as np
data = np.random.multivariate_normal(mean=[0.7, 0], cov=[[1,1], [1,2]], size=200)
Then I specify my model. My data has shape (200,2). Since I have a multivariate distribution the mean has to be a vector and the covariance a matrix.
model =
"""
data
{
int N; // Number of data points.
vector[2] X[N]; // Values.
}
parameters
{
vector[2] mu; // Mean
matrix[2,2] sigma; // Covariance matrix.
}
model
{
X ~ multi_normal(mu, sigma);
}
"""
Then I put the data in a dictionary as shown in the STAN tutorial on YouTube
my_data = {"N": 200, "X": data}
sm = ps.StanModel(model_code = model)
The model compiles without problems. However, when I try to fit the model I get a runtime error.
fit = sm.sampling(data=my_data, iter=1000, chains=4)
leads to
/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
RuntimeError: Initialization failed.
I'm not sure what causes this error since my code is just a slight abstraction from the one in the tutorial.
I found the answer myself. In the second block of code we need to substitute matrix[2,2]
with cov_matrix[2]
.
matrix[2,2] sigma; // Covariance matrix.
Then becomes
cov_matrix[2] sigma; // Covariance matrix.
Apparently STAN has a special data type for positive definite, symmetric matrices e.g. covariance matrices. This simple substitution makes the code run without throwing an error.