Search code examples
pythontensorflowdata-sciencetensorflow-probability

How can I create an array of distributions in TensorFlow Probability?


I'm trying to write code using Tensorflow Probability to classify a set of samples (coming from multiple Gaussian distributions) using the EM algorithm.

As I want to write this code for any generic problem (I want it to work if the samples come from 2 Gaussian distributions or 8 Gaussian distributions).

The problem that I have right now is that I can't find a way to create an array of tfd.Normal.

I want to have it as an array (or another similar type of data) because this way I can work with an indeterminate number of distributions.

Can anyone help me with this problem?

Would the following code be a solution?

true_mu = np.array([20,60], dtype=np.float64)
true_sigma = np.array([8,4], dtype=np.float64)  
true_dist = tfd.Normal(loc=true_mu, scale=true_sigma)

Solution

  • TFP distributions are batch capable out of the box. Your code should work, and represents a vector of 2 normal distributions, where the first is N(X|20, 8) and the second is N(X|60, 4).

    You can query this by true_dist.batch_shape (which will return [2] in this case).

    You can now sample: true_dist.sample() (returns a float64 with shape [2]).

    You can compute probabilities: true_dist.log_prob(0) (returns a float64 with shape [2], representing [log N(0|20, 8), log N(0|60, 4)]).

    You can estimate independent probs for each batch member: true_dist.log_prob([0, 1]) (returns a float64 with shape [2], representing [log N(0|20, 8), log N(1|60, 4)]).

    Also note that TFP distributions broadcast their parameters, so if you want two normals with the same loc and different scale, you can write tfd.Normal(0, [10, 20]).