Why I'm getting massive losses in this color transformation model I'm building?

I have a dataset of 200k+ color patches captured on two different mediums that I'm building a color transformation for. Initially I did a direct RGB-to-RGB input-output in the neural network. This works decently well, but I wanted to use a luminance-chrominance space to perform the match in to potentially better translate luminance and color contrast relationships. While I initially did it in CIELAB and YCbCr, the transformation of the dataset into either space is ultimately inaccurate as the data represents HDR scene data in a logarithmic container and neither space isn't built for HDR scene representation. So I'm attempting to use Dolby's ICtCP space which is built from unbounded scene linear information. I performed the transformation into the space and confirmed the output and array structure to be correct. However, upon feeding the variables into the network, it would immediatly start giving astronomical losses before flicking over to inf or nan loss. I can't figure out what the issue is.

I'm using the colour-science library for the internal color transforms and I've tested both a custom loss specifically for the ICtCP space and mse that's built into TF (to ensure this isn't a formatting issue). Both gave me extreme loss. I also printed the RGB and ICtCP values to text files to make sure there weren't any out of range values, but that was not the issue. RGB values are in a 0-1 range and ICtCp values are in a I (0:1), Ct (-1:1), Cp (-1:1) range.

My color transform functions in and out of ICtCP

#Davinci Wide Gamut Intermediate to Dolby ICtCP HDR opponent space
def DWG_TO_ITP(rgb_values):
    cs = colour.models.RGB_COLOURSPACE_DAVINCI_WIDE_GAMUT
    
    #DWG DI to XYZ Linear
    xyzLin = colour.RGB_to_XYZ(rgb_values, cs.whitepoint, cs.whitepoint, cs.matrix_RGB_to_XYZ, cctf_decoding=cs.cctf_decoding)
    
    #XYZ to ICtCp
    ictcp = colour.XYZ_to_ICtCp(xyzLin)
    
    return ictcp

# Dolby ICtCp HDR opponent space to Davinci Wide Gamut Intermediate
def ITP_TO_DWG(itp_values):

    cs = colour.models.RGB_COLOURSPACE_DAVINCI_WIDE_GAMUT
    
    #ICtCp to XYZ
    xyzLin = colour.ICtCp_to_XYZ(itp_values)
    
    #XYZ Linear to DWG DI
    dwg = colour.XYZ_to_RGB(xyzLin, cs.whitepoint, cs.whitepoint, cs.matrix_XYZ_to_RGB, cctf_encoding=cs.cctf_encoding)
    
    return dwg

Custom Loss (not currently active)

def ITP_loss(y_true, y_pred):
    
    # Split the ICtCp values into I, T, and P components
    I_1, T_1, P_1 = tf.split(y_true, 3, axis=-1)
    I_2, T_2, P_2 = tf.split(y_pred, 3, axis=-1)

    
    # Adjust the T components as in the original delta_E_ITP function
    T_1 = T_1 * 0.5
    T_2 = T_2 * 0.5

    # Compute the squared differences
    d_E_ITP = 720 * tf.sqrt(
        tf.square(I_2 - I_1) +
        tf.square(T_2 - T_1) +
        tf.square(P_2 - P_1)
    )
    
    # Return the mean error as the loss
    return tf.reduce_mean(d_E_ITP)

My neural network

def transform_nn(combined_rgb_values, output_callback, epochs=10000, batch_size=32):
    source_rgb = np.vstack([rgb_pair[0] for rgb_pair in combined_rgb_values])
    target_rgb = np.vstack([rgb_pair[1] for rgb_pair in combined_rgb_values])

    source_itp = DWG_TO_ITP(source_rgb)
    target_itp = DWG_TO_ITP(target_rgb)
    
    # Neural network base model with L2 regularization
    alpha = 0  # no penalty for now
    model = keras.Sequential([
        keras.layers.Input(shape=(3,)),
        keras.layers.Dense(128, activation = 'gelu', kernel_regularizer = keras.regularizers.L2(alpha)),
        keras.layers.Dense(64, activation = 'gelu', kernel_regularizer = keras.regularizers.L2(alpha)),
        keras.layers.Dense(32, activation = 'gelu', kernel_regularizer = keras.regularizers.L2(alpha)),
        keras.layers.Dense(3,)
    ])

    # Model optimization with Adam
    adam_optimizer = keras.optimizers.Adam(learning_rate=0.001)
    model.compile(
        optimizer= adam_optimizer,
        loss= "mean_squared_error",
        metrics=['mean_squared_error'])
    
    #normal
    early_stopping_norm = EarlyStopping(
        monitor = 'val_loss',
        patience = 30,
        verbose=1,
        restore_best_weights=True
    )
    
    # Train without early stopping
    history = model.fit(x=source_itp, y=target_itp,
                        epochs=epochs, batch_size=batch_size, 
                        verbose="auto", validation_split=0.3, 
                        callbacks=[early_stopping_norm])
    
    def interpolator(input_rgb):
        input_itp = DWG_TO_ITP(input_rgb)
        output_itp = model.predict(input_itp)
        output_rgb = ITP_TO_DWG(output_itp)
        return output_rgb
    
    return interpolator

And lastly, the type of losses I'm getting. Note: this is the mean_squared_error loss native to the compiler, but similar extremes values come with the custom loss. I never ran into this issue with either the CIELAB or YCbCr implementation.

Epoch 1/10000
  70/9078 [..............................] - ETA: 6s - loss: 19151210161612029119172287351962936121302040109299793920.0000 - mean_squared_error: 1915121016161202911917228735196293612130 150/9078 [..............................] - ETA: 6s - loss: 8937231408752302941104862160146934414914780835554000896.0000 - mean_squared_error: 89372314087523029411048621601469344149147 236/9078 [..............................] - ETA: 5s - loss: 8411239422438024050387858001836140461389620391983448064.0000 - mean_squared_error: 84112394224380240503878580018361404613896 322/9078 [>.............................] - ETA: 5s - loss: 55694874365583834449267799576553768559551931724848365789378071082067252634355826658906428848197067342214382161787617280.0000 407/9078 [>.............................] - ETA: 5s - loss: 9272320170949610945087897503565859983725183487173275717008470165482614622395441710684957926712521227412477496744314184602899 494/9078 [>.............................] - ETA: 5s - loss: inf - mean_squared_error: inf                                                                                               9078/9078 [==============================] - 7s 686us/step - loss: nan - mean_squared_error: nan - val_loss: nan - val_mean_squared_error: nan                                         
Epoch 2/10000
9078/9078 [==============================] - 6s 682us/step - loss: nan - mean_squared_error: nan - val_loss: nan - val_mean_squared_error: nan
Epoch 3/10000
8987/9078 [============================>.] - ETA: 0s - loss: nan - mean_squared_error: nan%

Solution

I suspect it's because ICtCp to XYZ can result in NaN, for example (0.24, -0.42, 0.48). In this case S' ~= -0.15 and with PQ transfer function (EOTF) we try to raise that to the power (1 / 78.84375) which isn't clear how to handle.