python tensorflow lstm recurrent-neural-network mixture-model

Tensorflow ValueError: Dimensions must be equal: LSTM+MDN

I am trying to make a next-word prediction model with LSTM + Mixture Density Network Based on this implementation(https://www.katnoria.com/mdn/).

Input: 300-dimensional word vectors*window size(5) and 21-dimensional array(c) representing topic distribution of the document, used to train hidden initial states.

Output: mixing coefficient*num_gaussians, variance*num_gaussians, mean*num_gaussians*300(vector size)

x.shape, y.shape, c.shape with an experimental 161 obserbations gives me such:

(TensorShape([161, 5, 300]), TensorShape([161, 300]), TensorShape([161, 21]))

from tensorflow.keras.layers import Input, Dense, LSTM, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.math import exp

# n_feat is size of word vector
n_feat = 300
window = 5
l = (window, n_feat)
hidden_state_dim = 21

# Number of gaussians to represent the multimodal distribution
k = 26

# Initial
mlp_inp = Input(shape=(hidden_state_dim,))
mlp_dense_h = Dense(128, activation='relu', name="dense_h")(mlp_inp)
mlp_dense_c = Dense(128, activation='relu', name="dense_c")(mlp_inp)

# Network
input = Input(shape=l)
layer1 = LSTM(128, return_sequences=True, name='baselayer1')(input, initial_state=[mlp_dense_h, mlp_dense_c])
layer2 = LSTM(128, name='baselayer2')(layer1)

# Mean
mu = Dense((n_feat * k), activation=None, name='mean_layer')(layer2)
# variance (should be greater than 0 so we exponentiate it)
var_layer = Dense(k, activation=None, name='dense_var_layer')(layer2)
var = Lambda(lambda x: exp(x), output_shape=(k,), name='variance_layer')(var_layer)
# mixing coefficient should sum to 1.0
pi = Dense(k, activation='softmax', name='pi_layer')(layer2)

Below is the .summary() of my model

Model: "model_12"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_7 (InputLayer)            [(None, 21)]         0                                            
__________________________________________________________________________________________________
input_8 (InputLayer)            [(None, 5, 300)]     0                                            
__________________________________________________________________________________________________
dense_h (Dense)                 (None, 128)          2816        input_7[0][0]                    
__________________________________________________________________________________________________
dense_c (Dense)                 (None, 128)          2816        input_7[0][0]                    
__________________________________________________________________________________________________
baselayer1 (LSTM)               (None, 5, 128)       219648      input_8[0][0]                    
                                                                 dense_h[0][0]                    
                                                                 dense_c[0][0]                    
__________________________________________________________________________________________________
baselayer2 (LSTM)               (None, 128)          131584      baselayer1[0][0]                 
__________________________________________________________________________________________________
dense_var_layer (Dense)         (None, 26)           3354        baselayer2[0][0]                 
__________________________________________________________________________________________________
pi_layer (Dense)                (None, 26)           3354        baselayer2[0][0]                 
__________________________________________________________________________________________________
mean_layer (Dense)              (None, 7800)         1006200     baselayer2[0][0]                 
__________________________________________________________________________________________________
variance_layer (Lambda)         (None, 26)           0           dense_var_layer[0][0]            
==================================================================================================
Total params: 1,369,772
Trainable params: 1,369,772
Non-trainable params: 0
__________________________________________________________________________________________________

However, when I try to run the training process, I get the following error

ValueError: in user code:

    <ipython-input-70-084e2be19035>:7 train_step  *
        loss = mdn_loss(y, pi_, mu_, var_)
    <ipython-input-67-9a3cf3d4ccd2>:18 mdn_loss  *
        out = calc_pdf(y_true, mu, var)
    <ipython-input-67-9a3cf3d4ccd2>:6 calc_pdf  *
        value = tf.subtract(y, mu)**2
.....
ValueError: Dimensions must be equal, but are 300 and 7800 for '{{node Sub}} = Sub[T=DT_FLOAT](y, model_15/mean_layer/BiasAdd)' with input shapes: [161,300], [161,7800].

It tells me that there is a problem with the dimensions of variables specified in tf.subtract() used in calc_pdf(),

# Take a note how easy it is to write the loss function in 
# new tensorflow eager mode (debugging the function becomes intuitive too)

def calc_pdf(y, mu, var):
    """Calculate component density"""
    value = tf.subtract(y, mu)**2
    value = (1/tf.math.sqrt(2 * np.pi * var)) * tf.math.exp((-1/(2*var)) * value)
    return value


def mdn_loss(y_true, pi, mu, var):
    """MDN Loss Function
    The eager mode in tensorflow 2.0 makes is extremely easy to write 
    functions like these. It feels a lot more pythonic to me.
    """
    out = calc_pdf(y_true, mu, var)
    # multiply with each pi and sum it
    out = tf.multiply(out, pi)
    out = tf.reduce_sum(out, 1, keepdims=True)
    out = -tf.math.log(out + 1e-10)
    return tf.reduce_mean(out)

but I don't understand how to fix this. I checked the original implementation (in the link above) with 4000 observations, 1 feature, and 26 distributions which had dimensions [4000, 1], [4000, 26] for the particular function, and was working fine. I feel like it should work with [161,300], [161,7800] as well but it's not.

How can I fix this?

(I've checked similar questions regarding "dimension must be equal" but could not figure out how I could make this work for this particular implementation.)

I can post additional info or code if it is not enough, I would really appreciate an answer!

Solution

for MDN model , the likelihood for each sample has to be calculated with all the Gaussians pdf , to do that I think you have to reshape your matrices ( y_true and mu) and take advantage of the broadcasting operation by adding 1 as the last dimension . e.g:

def calc_pdf(y, mu, var):
   
    """Calculate component density"""
   y = tf.reshape(y , (161,300,1))
   mu =  tf.reshape(mu ,(161,300,26))
   value = tf.subtract(y, mu)**2