python tensorflow tensorflow2.0 tensor tensorflow-probability

Invalid argument error with TensorFlow 2 with self-defined loss function, although everything seems to be correct

I am currently using TensorFlow 2 to train models that not only provide point forecasts for time series, but also forecast distribution metrics (e.g. mean and variance). For this I create a layer and modify the loss function to optimize the corresponding parameters. For the one-dimensional case with only one predicted time series, this works very well.

For the case with two time series I wanted to try to predict correlations accordingly and used the function "MultivariateNormalFullCovariance" from "tensorflow_probability". But with this I get the following error:

InvalidArgumentError:  Input matrix must be square.
     [[node negative_normdist_loss_2/MultivariateNormalFullCovariance/init/Cholesky (defined at d:\20_programming\python\virtualenvs\tensorflow-gpu-2\lib\site-packages\tensorflow_probability\python\distributions\mvn_full_covariance.py:194) ]] [Op:__inference_train_function_1133]

Errors may have originated from an input operation.
Input Source operations connected to node negative_normdist_loss_2/MultivariateNormalFullCovariance/init/Cholesky:
 negative_normdist_loss_2/MultivariateNormalFullCovariance/init/covariance_matrix (defined at d:\20_programming\python\virtualenvs\tensorflow-gpu-2\lib\site-packages\tensorflow_probability\python\distributions\mvn_full_covariance.py:181)

Function call stack:
train_function

I am aware that something is wrong with the input dimensions, but unfortunately I have not been able to find the specific error. (The correlation matrix is already quadratic, even if it contains the same parameter twice.)

The code itself is a bit extensive. Therefore I have uploaded an working (univariate) and non-working example (multivariate) including sample data to this directory:

https://drive.google.com/drive/folders/1IIAtKDB8paWV0aFVFALDUAiZTCqa5fAN?usp=sharing

For a better overview I have also copied in the essential routines below:

def negative_normdist_layer_2(x):
    # Get the number of dimensions of the input
    num_dims = len(x.get_shape())
    # Separate the parameters
    mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(x, num=5, axis=-1)
    # Add one dimension to make the right shape
    mu1 = tf.expand_dims(mu1, -1)
    mu2 = tf.expand_dims(mu2, -1)
    sigma11 = tf.expand_dims(sigma11, -1)
    sigma12 = tf.expand_dims(sigma12, -1)
    sigma22 = tf.expand_dims(sigma22, -1)
    # Apply a softplus to make positive
    sigma11 = tf.keras.activations.softplus(sigma11)
    sigma22 = tf.keras.activations.softplus(sigma22)
    # Join back together again
    out_tensor = tf.concat((mu1, mu2, sigma11, sigma12, sigma22), axis=num_dims-1)
    return out_tensor

def negative_normdist_loss_2(y_true, y_pred):
    # Separate the parameters
    mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(y_pred, num=5, axis=-1)
    # Add one dimension to make the right shape
    mu1 = tf.expand_dims(mu1, -1)
    mu2 = tf.expand_dims(mu2, -1)
    sigma11 = tf.expand_dims(sigma11, -1)
    sigma12 = tf.expand_dims(sigma12, -1)
    sigma22 = tf.expand_dims(sigma22, -1)
    # Calculate the negative log likelihood
    dist = tfp.distributions.MultivariateNormalFullCovariance(
        loc = [mu1, mu2], 
        covariance_matrix = [[sigma11, sigma12], [sigma12, sigma22]]
    )
    nll = tf.reduce_mean(-dist.log_prob(y_true))
    return nll

# Define inputs with predefined shape
input_shape = lookback // step, float_data.shape[-1]
inputs = Input(shape=input_shape)

# Build network with some predefined architecture
output1 = Flatten()(inputs)
output2 = Dense(32)(output1)

# Predict the parameters of a negative normdist distribution
outputs = Dense(5)(output2)
distribution_outputs = Lambda(negative_normdist_layer_2)(outputs)

# Construct model
model_norm_2 = Model(inputs=inputs, outputs=distribution_outputs)

opt = Adam()
model_norm_2.compile(loss = negative_normdist_loss_2, optimizer = opt)

history_norm_2 = model_norm_2.fit_generator(train_gen_mult,
                                            steps_per_epoch=500,
                                            epochs=20,
                                            validation_data=val_gen_mult,
                                            validation_steps=val_steps)

The operating system I use is Windows 10, the Python version is 3.6. All libraries listed in the sample code are the latest, including tensorflow-gpu.

I would be very grateful if the exact cause of the error could be determined and a solution be found.

Solution

The mean and covariance parameters have to be transposed because they are supposed to be of size (batch_size, 2) and (batch_size, 2, 2) (for a problem of dimension 2) according to the documentation of MultivariateNormalFullCovariance.There were problems with the inversion of the covariance matrix despite the layer to make sure that the diagonal terms are positive. You can use MultivariateNormalTriL which takes a lower triangular matrix instead, no more problems with covariance inversion (keeping the softplus):

def negative_normdist_loss_2(y_true, y_pred):
    # Separate the parameters
    mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(y_pred, num=5, axis=-1)
    mu = tf.transpose([mu1, mu2], perm=[1, 0])
    sigma_tril = tf.transpose([[sigma11, tf.zeros_like(sigma11)], [sigma12, sigma22]], perm=[2, 0, 1])
    dist = tfp.distributions.MultivariateNormalTriL(loc=mu, scale_tril=sigma_tril)
    nll = tf.reduce_mean(-dist.log_prob(y_true))
    return nll

However, I am wondering about the idea behind it. It corresponds to an unsupervised approach which is interesting.The data allows you to estimate mean and covariance parameters for a somewhat unconventional cost function, but it is not clear what you can do with it afterwards.