I'm trying to write code using Tensorflow Probability to classify a set of samples (coming from multiple Gaussian distributions) using the EM algorithm.
As I want to write this code for any generic problem (I want it to work if the samples come from 2 Gaussian distributions or 8 Gaussian distributions).
The problem that I have right now is that I can't find a way to create an array of tfd.Normal
.
I want to have it as an array (or another similar type of data) because this way I can work with an indeterminate number of distributions.
Can anyone help me with this problem?
Would the following code be a solution?
true_mu = np.array([20,60], dtype=np.float64)
true_sigma = np.array([8,4], dtype=np.float64)
true_dist = tfd.Normal(loc=true_mu, scale=true_sigma)
TFP distributions are batch capable out of the box. Your code should work, and represents a vector of 2 normal distributions, where the first is N(X|20, 8)
and the second is N(X|60, 4)
.
You can query this by true_dist.batch_shape
(which will return [2]
in this case).
You can now sample: true_dist.sample()
(returns a float64 with shape [2]
).
You can compute probabilities: true_dist.log_prob(0)
(returns a float64 with shape [2]
, representing [log N(0|20, 8), log N(0|60, 4)]
).
You can estimate independent probs for each batch member: true_dist.log_prob([0, 1])
(returns a float64
with shape [2]
, representing [log N(0|20, 8), log N(1|60, 4)]
).
Also note that TFP distributions broadcast their parameters, so if you want two normals with the same loc and different scale, you can write tfd.Normal(0, [10, 20])
.