I am training a neural network for time series regression. The model is
####################################################################################################################
# Define ANN Model
# define two sets of inputs
acc = layers.Input(shape=(3,1,))
gyro = layers.Input(shape=(3,1,))
# the first branch operates on the first input
x = Conv1D(256, 1, activation='relu')(acc)
x = Conv1D(128, 1, activation='relu')(x)
x = Conv1D(128, 1, activation='relu')(x)
x = MaxPooling1D(pool_size=3)(x)
x = Model(inputs=acc, outputs=x)
# the second branch opreates on the second input
y = Conv1D(256, 1, activation='relu')(gyro)
y = Conv1D(128, 1, activation='relu')(y)
y = Conv1D(128, 1, activation='relu')(y)
y = MaxPooling1D(pool_size=3)(y)
y = Model(inputs=gyro, outputs=y)
# combine the output of the three branches
combined = layers.concatenate([x.output, y.output])
# combined outputs
z = Bidirectional(LSTM(128, dropout=0.25, return_sequences=False,activation='tanh'))(combined)
z = Reshape((256,1),input_shape=(128,))
z = Bidirectional(LSTM(128, dropout=0.25, return_sequences=False,activation='tanh'))(combined)
#z = Dense(10, activation="relu")(z)
z = Flatten()(z)
z = Dense(4, activation="linear")(z)
model = Model(inputs=[x.input, y.input], outputs=z)
model.compile(loss=loss, optimizer = tf.keras.optimizers.Adam(),metrics=['mse'],run_eagerly=True)
I have tried to implement a custom loss function (based on different papers).
Math
The error will calculated as follows:
y_pred = [w x y z]
y_true = [w1 x1 y1 z1]
error = 2 * acos(w*w1 + x*x1 + y*y1 + z*z1)
Based on this formula I wrote the custom loss function:
def loss(y_true, y_pred):
z = y_true * (y_pred )
wtot = tf.reduce_sum(z,axis=1)
error = 2*tf.math.acos(K.clip(tf.math.sqrt(wtot*wtot), -1.,1.))
return error
But while the loss value is decreasing the MSE increased and I can see an offset in the output which will grow by the number of epochs. I understand that we do not optimize this Network for MSE but based on mathematics the MSE must be reduced or converge to some value near 1.
Orange is the Target/Reference
Blue is the Network ouptut
for 1 epoch
for 10 epochs
for 50 epochs
To solve this problem, I used geometric distance equation to find the loss value
def QQuat_mult(y_true, y_pred):
"""
The function takes in two quaternions, normalizes the first one, and then multiplies the two
quaternions together.
The function returns the absolute value of the vector part of the resulting quaternion.
The reason for this is that the vector part of the quaternion is the axis of rotation, and the
absolute value of the vector part is the angle of rotation.
The reason for normalizing the first quaternion is that the first quaternion is the predicted
quaternion, and the predicted quaternion is not always normalized.
The reason for returning the absolute value of the vector part of the resulting quaternion is that
the angle of rotation is always positive.
The reason for returning the vector part of the resulting quaternion is that the axis of rotation is
always a vector.
:param y_true: the ground truth quaternion
:param y_pred: the predicted quaternion
:return: The absolute value of the quaternion multiplication of the predicted and true quaternions.
"""
y_pred = tf.linalg.normalize(y_pred, ord='euclidean', axis=1)[0]
w0, x0, y0, z0 = tf.split(
(tf.multiply(y_pred, [1., -1, -1, -1]),), num_or_size_splits=4, axis=-1)
w1, x1, y1, z1 = tf.split(y_true, num_or_size_splits=4, axis=-1)
w = w0*w1 - x0*x1 - y0*y1 - z0*z1
w = tf.subtract(w, 1)
x = w0*x1 + x0*w1 + y0*z1 - z0*y1
y = w0*y1 - x0*z1 + y0*w1 + z0*x1
z = w0*z1 + x0*y1 - y0*x1 + z0*w1
loss = tf.abs(tf.concat(values=[w, x, y, z], axis=-1))
return tf.reduce_mean(loss)