Weights loading problems with custom TF model

I encountered the problem on Google Colab, TF version: 2.16.1, keras version: 3.3.3

Here is the error:

ValueError                                Traceback (most recent call last)
<ipython-input-46-4d65c8f7b80b> in <cell line: 1>()
----> 1 m3.load_weights("./model_v8.weights.h5")

1 frames
/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
    120             # To get the full stack trace, call:
    121             # `keras.config.disable_traceback_filtering()`
--> 122             raise e.with_traceback(filtered_tb) from None
    123         finally:
    124             del filtered_tb

/usr/local/lib/python3.10/dist-packages/keras/src/saving/saving_lib.py in _raise_loading_failure(error_msgs, warn_only)
    293         warnings.warn(msg)
    294     else:
--> 295         raise ValueError(msg)
    296 
    297 

ValueError: A total of 1 objects could not be loaded. Example error message for object <Embedding name=embedding_10, built=True>:

Layer 'embedding_10' expected 1 variables, but received 0 variables during loading. Expected: ['embeddings']

List of objects that could not be loaded:
[<Embedding name=embedding_10, built=True>]

I'm implementing the seq2seq with Bahdanau attention. The weight is previously trained from the same model yesterday. Before loading, I make some translation with translate() method because it seems that weight cannot be loaded if I just instantiate model instance then load.

This is the code of my model.

Encoder:

class Encoder(Layer):
    def __init__(self,
                tokenizer,
                embedding_size,
                hidden_units):
        """
            Encoder Block in seq2seq

        :param tokenizer: tokenizer of the source language
        :param embedding_size: dimensionality of the embedding layer
        :param hidden_units: dimensionality of the output
        """

        super().__init__()
        self.tokenizer = tokenizer
        self.embedding_size = embedding_size
        self.hidden_units = hidden_units
        self.vocab_size = tokenizer.vocabulary_size()
        self.embedding = Embedding(input_dim=self.vocab_size,
                                   output_dim=embedding_size)
        self.rnn = Bidirectional(
            merge_mode="sum",
            layer=LSTM(units=hidden_units,
                    dropout=DROPOUT,
                    return_sequences=True,
                    return_state=True))

    def call(self,
            x,
            training=True):
        """
        :param x: [batch, time_steps]
        :param training: is training or not
        :return:
            encoder_hidden_state: [batch, hidden_state_dim]
            state_h: [batch, hidden_state_dim]
            state_c: [batch, hidden_state_dim]
        """
        mask = tf.where(x != 0, True, False)
        x = self.embedding(x)
        x, forward_h, forward_c, backward_h, backward_c = self.rnn(x, mask=mask,
                                                                training=training)

        return x, forward_h + backward_h, forward_c + backward_c

Bahdanau Attention:

class BahdanauAttention(Layer):
    def __init__(self,
                 hidden_units):
        super().__init__()
        self.Va = Dense(1)
        self.Wa = Dense(hidden_units)
        self.Ua = Dense(hidden_units)
        self.norm = LayerNormalization()
        self.tanh = Activation(tf.keras.activations.tanh)
        self.add = Add()

    def call(self,
             context, x):
        """
            Calculate the context vector based on all encoder hidden states and
            previous decoder state.

        :param: context: tensor, all encoder hidden states
        :param: x: tensor, previous state from Decoder
        :return:
            context_vector: tensor, the calculated context vector based on the
            input parameters
        """
        # Expand dims to ensure scores shape = [batch, Ty, Tx]
        context = tf.expand_dims(context, axis=1)
        x = tf.expand_dims(x, axis=2)

        scores = self.Va(self.tanh(self.add([self.Wa(context), self.Ua(x)])))
        scores = tf.squeeze(scores, axis=-1)
        attn_weights = tf.nn.softmax(scores, axis=-1)

        # NOTE: context shape = [batch, 1, Tx, feature] so that expand
        # dim of attention weights
        context_vector = tf.expand_dims(attn_weights, axis=-1) * context
        context_vector = tf.reduce_sum(context_vector, axis=-2)
        context_vector = self.norm(context_vector)
        context_vector = self.add([context_vector, tf.squeeze(x, -2)])

        return context_vector

Decoder:

class Decoder(Layer):
    def __init__(self,
                tokenizer,
                embedding_size,
                hidden_units):
        """
            Decoder Block in seq2seq

        :param tokenizer: tokenizer of the source language
        :param embedding_size: dimensionality of the embedding layer
        :param hidden_units: dimensionality of the output
        """

        super().__init__()
        self.tokenizer = tokenizer
        self.embedding_size = embedding_size
        self.hidden_units = hidden_units
        self.vocab = tokenizer.get_vocabulary()
        self.vocab_size = tokenizer.vocabulary_size()
        self.embedding = Embedding(input_dim=self.vocab_size,
                                output_dim=embedding_size)
        self.rnn = LSTM(units=hidden_units,
                        dropout=DROPOUT,
                        return_sequences=True,
                        return_state=True)
        self.attention = BahdanauAttention(hidden_units)
        self.dense = Dense(self.vocab_size)

    def call(self,
            context, x,
            encoder_state,
            training=True,
            return_state=False):
        """
        :param context: all encoder states
        :param x: all initial decoder states
        :param encoder_state: last state from encoder
        :param training:
        :param return_state:

        :return:
            logits:
            state_h: hidden state
            state_c: cell state
        """
        mask = tf.where(x != 0, True, False)
        x = self.embedding(x)
        decoder_outputs, state_h, state_c = self.rnn(x, initial_state=encoder_state,
                                                    mask=mask,
                                                    training=training)
        dense_inputs = self.attention(context, decoder_outputs)
        logits = self.dense(dense_inputs)

        if return_state:
            return logits, state_h, state_c
        else:
            return logits

Model:

class NMT(Model):
    @classmethod
    def add_method(cls, fun):
        setattr(cls, fun.__name__, fun)
        return fun

    def __init__(self,
                 input_tokenizer,
                 output_tokenizer,
                 embedding_size,
                 hidden_units):
        """
            Initialize an instance for Neural Machine Translation Task

        :param input_tokenizer: tokenizer of the input language
        :param output_tokenizer: tokenizer of the output language
        :param embedding_size: dimensionality of embedding layer
        :param hidden_units: dimensionality of the output
        """

        super().__init__()
        self.input_tokenizer = input_tokenizer
        self.output_tokenizer = output_tokenizer
        self.embedding_size = embedding_size
        self.hidden_units = hidden_units
        self.encoder = Encoder(input_tokenizer,
                               embedding_size,
                               hidden_units)
        self.decoder = Decoder(output_tokenizer,
                               embedding_size,
                               hidden_units)

    def call(self,
             inputs):
        encoder_inputs, decoder_inputs = inputs
        encoder_outputs, state_h, state_c = self.encoder(encoder_inputs)
        logits = self.decoder(encoder_outputs, decoder_inputs,
                              [state_h, state_c])

        return logits

    @NMT.add_method
    def translate(self, next_inputs,
                maxlen=40):
        """
        """
        def sampling(logits):
            probs = tf.nn.softmax(logits)
            dist = probs.numpy().squeeze()
            idx = np.random.choice(range(self.decoder.vocab_size), p=dist)

            return idx

        translation = []
        next_inputs = expand_contractions(next_inputs.lower(), en_contraction_map)
        next_idx = np.asarray(self.encoder.tokenizer(next_inputs))

        while next_idx.ndim != 2:
            next_idx = tf.expand_dims(next_idx, axis=0)

        encoder_outputs, state_h, state_c = self.encoder(next_idx, training=False)

        next_inputs = "[START]"
        next_idx = np.asarray(word_to_idx[next_inputs])

        for i in range(maxlen):
            while next_idx.ndim != 2:
                next_idx = tf.expand_dims(next_idx, axis=0)

            logits, state_h, state_c = self.decoder(encoder_outputs, next_idx,
                                                    [state_h, state_c],
                                                    training=False,
                                                    return_state=True)
            next_idx = sampling(logits)
            next_inputs = self.decoder.vocab[next_idx]

            if next_inputs == "[END]":
                break
            elif next_inputs == "[UNK]":
                continue
            else:
                translation.append(next_inputs)

        return " ".join(translation)

Here is the link to notebook if u want to take a look at it: https://colab.research.google.com/drive/1EKOm7ULFKEusvEFb8thSGOHTfRnRKru7?usp=sharing

I searched for solutions but they are often about checking the tf and keras version. I used h5py and it said that keras ver in my weight file is 3.3.3 so it seems there is no problem with it. I also tried to save the entire model instead with serialized custom object in get_config() and from_config() method but the same error appears along with a warning about the build() method for Decoder class so I went back to the saving weight then load it only.

I also noticed that on the Saving and Loading Model tutorial by TF, weight is loaded directly in an example. How to achieve that? By the way, is there any guidance on implementing a build() for custom Layer as my Decoder is composed of other different TF Layer so I can't get it how to implement one.

Solution

It turned out that default tf version of Colab is 2.15.1, and updated to 2.16.1 when I installed tf_text without specifying the version at the head of notebook. Therefore, I specified tf_text to 2.15.1 and weight is loaded flawlessly.