python tensorflow keras neural-network log-likelihood

Mixture Density Network (MDN) returning probabilities 1.0 & 0.0 only

I am building a mixture density network that attempts to forecast the distribution of one variable on two covariates. One of the covariates has hourly data while the other doesn't vary during the day (i.e., daily data). Prior work showed that 2 distributions should render good results, so I am using 2 distributions as well.

For the loss function, I'm using a custom-built negative log-likelihood for normal distribution, to which I apply the log-sum-exp technique.

I'm using two hidden layers with relu activation function and 60 neurons, a batch of 60, and e-4 learning rate.

However, the results show always probabilities of 1 for one of the distributions and 0 for the other. It makes no difference whether I increase the number of epochs. Note that the results for the distribution with probability 1.0 are perfectly reasonable, but given the prior work on this topic, it's hard for me to believe that in 60K+ hours, not a single one has a mixture of 2 different distributions.

Any suggestions about how to correct the probabilities or what could be the cause of the 0-1-only probabilities would be highly appreciated.

from tensorflow.keras import backend as bk

# reading inputs, etc.

components = 2 # Number of normal distributions in mixture
no_parameters = 3 # Number of parameters of the mixtures (weight, mean, std. dev)
neurons = 60 # Number of neurons per layer
SB = 1 # Number of outputs we want to predict

# Make the input tensor: two covariates-- quantity & price.
inputs = ks.Input(shape=(X_train.shape[1],))

h1 = ks.layers.Dense(neurons, activation="relu",
                     kernel_initializer='ones', bias_initializer='ones')(inputs)
h2 = ks.layers.Dense(neurons, activation="relu",
                     kernel_initializer='ones', bias_initializer='ones')(h1)
alphas = ks.layers.Dense(components, activation="softmax", name="alphas",
                         kernel_initializer='ones', bias_initializer='ones')(h2)
mus = ks.layers.Dense(components, name="mus")(h2)
sigmas = ks.layers.Dense(components, activation="relu", name="sigmas",
                         kernel_initializer='ones', bias_initializer='ones')(h2)
outputVector = ks.layers.Concatenate(name="output")([alphas, mus, sigmas])

model = ks.Model(inputs=inputs, outputs=outputVector)

def slice_parameter_vectors(parameter_vector):
    """ Returns an unpacked list of parameter vectors. """
    return [parameter_vector[:, i * components:(i + 1) * components] for i in range(no_parameters)]

def log_sum_exp(x, axis=None):
    """Log-sum-exp trick implementation"""
    x_max = bk.max(x, axis=axis, keepdims=True)
    return bk.log(bk.sum(bk.exp(x - x_max),
                         axis=axis, keepdims=True)) + x_max

def mean_log_Gaussian_like2(y, parameter_vector):
    """ Computes the mean negative log-likelihood loss of the observed price given the mixture parameters. """
    alpha, mu, sigma = slice_parameter_vectors(parameter_vector)  # Unpack parameter vectors
    mu = tf.keras.backend.reshape(mu, [-1, SB, 2])
    alpha = bk.softmax(bk.clip(alpha, 1e-8, 1.))
    exponent = bk.log(alpha) - .5 * float(SB) * bk.log(2 * np.pi) \
               - float(SB) * bk.log(sigma) \
               - bk.sum((bk.expand_dims(y, 2) - mu) ** 2, axis=1) / (2 * (sigma) ** 2)
    log_likelihood = log_sum_exp(exponent, axis=1)
    return -bk.mean(log_likelihood)

model.compile(optimizer=ks.optimizers.Adam(learning_rate=1e-4, clipvalue=1.0), # , clipvalue=0.5
              loss= mean_log_Gaussian_like2,
              metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=60, epochs=500)

y_pred = model.predict(X_test)

Solution

I solved this issue. The solution was to get rid of the softmax function from redefining alphas. That is alpha = bk.softmax(bk.clip(alpha, 1e-8, 1.)) should be alpha = bk.clip(alpha, 1e-8, 1.). Thank you everyone.