python tensorflow tf.keras loss-function dropout

In model.fit in tf.keras, is there a way to pass each sample in a batch n times?

I am trying to write a custom loss function for a model that utilizes Monte Carlo (MC) dropout. I want the model to run through each sample in a batch n times before feeding the predictions to the loss function. A current toy code is shown below. The model has 24 inputs and 10 outputs with 5000 training samples.

import numpy as np
import tensorflow as tf

X = np.random.rand(5000,24)
y = np.random.rand(5000,10)

def MC_Loss(y_true,y_pred):
    mu = tf.math.reduce_mean(y_pred,axis=0)
    #error = tf.square(y_true-mu)
    error = tf.square(y_true-y_pred)
    var = tf.math.reduce_variance(y_pred,axis=0)
    return tf.math.divide(error,var)/2 + tf.math.log(var)/2 + tf.math.log(2*np.pi)/2

input_layer = tf.keras.layers.Input(shape=(X.shape[1],))
hidden_layer = tf.keras.layers.Dense(units=100,activation='elu')(input_layer)
do_layer = tf.keras.layers.Dropout(rate=0.20)(hidden_layer,training=True)
output_layer = tf.keras.layers.Dense(units=10,activation='sigmoid')(do_layer)

model = tf.keras.models.Model(input_layer,output_layer)
model.compile(loss=MC_Loss,optimizer='Adam')

model.fit(X,y,epochs=100,batch_size=128,shuffle=True)

The current shape of y_true and y_pred are (None,10) with None being the batch_size. I want to be able to have n values for each sample in the batch, so I can get the mean and standard deviation for each sample to use in the loss function. I want these value, because the mean and standard deviation should be unique to each sample, not taken across all samples in a batch. The current shape of mu and sigma are (10,) and I would want them to be (None,10) which would mean y_true and y_pred have the shape (None,n,10).

How can I accomplish this?

Solution

I believe I found the solution after some experimentation. The modified code is shown below.

import numpy as np
import tensorflow as tf

n = 100

X = np.random.rand(5000,24)
X1 = np.concatenate(([X.reshape(X.shape[0],1,X.shape[1]) for _ in range(n)]),axis=1)
y = np.random.rand(5000,10)
y1 = np.concatenate(([y.reshape(y.shape[0],1,y.shape[1]) for _ in range(n)]),axis=1)

def MC_Loss(y_true,y_pred):
    mu = tf.math.reduce_mean(y_pred,axis=1)
    obs = tf.math.reduce_mean(y_true,axis=1)
    error = tf.square(obs-mu)
    var = tf.math.reduce_variance(y_pred,axis=1)
    return tf.math.divide(error,var)/2 + tf.math.log(var)/2 + tf.math.log(2*np.pi)/2

input_layer = tf.keras.layers.Input(shape=(X.shape[1]))
hidden_layer = tf.keras.layers.Dense(units=100,activation='elu')(input_layer)
do_layer = tf.keras.layers.Dropout(rate=0.20)(hidden_layer,training=True)
output_layer = tf.keras.layers.Dense(units=10,activation='sigmoid')(do_layer)

model = tf.keras.models.Model(input_layer,output_layer)
model.compile(loss=MC_Loss,optimizer='Adam')

model.fit(X1,y1,epochs=100,batch_size=128,shuffle=True)

So what I am now doing is stacking the inputs and outputs about an intermediate axis, creating n identical sets of all input and output samples. While tensorflow shows a warning because the model is created without knowledge of this intermediate axis. It still trains with no issues and the shapes are as expected.

Note: since y_true now has the shape (None,n,10), you have to take the mean about the intermediate axis which gives you the true value since all n are identical.