Search code examples
tensorflowmachine-learningembedding

InvalidArgumentError: Graph execution error: Received a label value of 83193 which is outside the valid range of [0, 128)


I tried to build a model (for simple chatbot), but i stuck on Embedding layer. I have X_train shape = (100082, 1307) --> questions y_train shape = (100082, 1307) --> answers Where 1307 is my maximum length of padded sequences

Below is my code:

model = Sequential([
    Embedding(total_tokens, 128, input_length=max_seq_length),
    # Bidirectional(LSTM(150)),
    # Dense(total_tokens, activation='softmax')
])
model.summary()

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=50)

But, when i was training the model, there was an error appeared like below: Received a label value of 83193 which is outside the valid range of [0, 128).

How should i solve this problem?


Solution

  • The issue arises from your label y_train contains integers larger than the specified interval [0, 128), because your output size of your Embedding layer is set to 128.

    Let's illustrate this with a simplified example. Here I reduce the shape from (100082, 1307) to (1000, 1307):

    We use a random generator to create the y_train tensor as follows:

    y_train = tf.random.uniform(shape=([1000, 1307]), minval=0, maxval=127)
    

    Here we test for 10 epochs:

    model = Sequential([
        Embedding(total_tokens, 128, input_length=max_seq_length),
    ])
    model.summary()
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=10)
    

    Here is the output:

    Model: "sequential_1"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     embedding_1 (Embedding)     (None, 1307, 128)         128000    
                                                                     
    =================================================================
    Total params: 128000 (500.00 KB)
    Trainable params: 128000 (500.00 KB)
    Non-trainable params: 0 (0.00 Byte)
    _________________________________________________________________
    Epoch 1/10
    32/32 [==============================] - 4s 124ms/step - loss: 10.5185 - accuracy: 0.0000e+00
    Epoch 2/10
    32/32 [==============================] - 5s 152ms/step - loss: 10.4612 - accuracy: 0.0000e+00
    Epoch 3/10
    32/32 [==============================] - 4s 116ms/step - loss: 10.4123 - accuracy: 0.0000e+00
    Epoch 4/10
    32/32 [==============================] - 4s 116ms/step - loss: 10.3636 - accuracy: 0.0000e+00
    Epoch 5/10
    32/32 [==============================] - 6s 175ms/step - loss: 10.3126 - accuracy: 0.0000e+00
    Epoch 6/10
    32/32 [==============================] - 4s 120ms/step - loss: 10.2577 - accuracy: 0.0000e+00
    Epoch 7/10
    32/32 [==============================] - 4s 119ms/step - loss: 10.1967 - accuracy: 0.0000e+00
    Epoch 8/10
    32/32 [==============================] - 5s 152ms/step - loss: 10.1267 - accuracy: 0.0000e+00
    Epoch 9/10
    32/32 [==============================] - 4s 119ms/step - loss: 10.0431 - accuracy: 0.0000e+00
    Epoch 10/10
    32/32 [==============================] - 4s 119ms/step - loss: 9.9369 - accuracy: 7.6511e-07
    <keras.src.callbacks.History at 0x7de0b88f0ee0>
    

    In this example, the model runs without errors.

    However, if the upper limit of the generator is set to 200 (larger than 128):

    y_train = tf.random.uniform(shape=([1000, 1307]), minval=0, maxval=200)
    

    An error occurs:

    Received a label value of 199 which is outside the valid range of [0, 128).
    

    Therefore, you should ensure that the labels in y_train fall within the valid range specified by the output size of your Embedding layer.