ValueError: Shapes (32, 129) and (32, 1) are incompatible

I found plenty of seemingly related Stackoverflow posts with the same error message when fitting neural network models on data but none of them seemed to relate directly to my use case, i.e. fitting using the sparse_categorical_crossentropy loss function. I'm aware I could alternatively use the caterorical_crossentropy by first encoding the target variables to one-hot form using to_categorical() but due to the large number of target classes, I would run into memory issues with that approach, hence a sparse method is the only reasonable workaround.

Below I included a sample data and a full reproducible example. Error pops up in the model.fit(X,y) line and error message is as follows:

ValueError: in user code:

    File "...\.venv\lib\site-packages\keras\engine\training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 864, in train_step
        return self.compute_metrics(x, y, y_pred, sample_weight)
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 957, in compute_metrics
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    File "...\.venv\lib\site-packages\keras\engine\compile_utils.py", line 459, in update_state
        metric_obj.update_state(y_t, y_p, sample_weight=mask)
    File "...\.venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated
        update_op = update_state_fn(*args, **kwargs)
    File "...\.venv\lib\site-packages\keras\metrics.py", line 178, in update_state_fn
        return ag_update_state(*args, **kwargs)
    File "...\.venv\lib\site-packages\keras\metrics.py", line 2364, in update_state  **
        label_weights=label_weights)
    File "...\.venv\lib\site-packages\keras\utils\metrics_utils.py", line 619, in update_confusion_matrix_variables
        y_pred.shape.assert_is_compatible_with(y_true.shape)

    ValueError: Shapes (32, 129) and (32, 1) are incompatible

Full code:

import numpy as np 
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Flatten    
from keras.preprocessing.text import Tokenizer


train_data = ['o by no means honest ventidius i gave it freely ever and theres none can truly say he gives if our betters play at that game we must not dare to imitate them faults that are rich are fair'
 'but was not this nigh shore'
 'impairing henry strengthening misproud york the common people swarm like summer flies and whither fly the gnats but to the sun'
 'what while you were there'
 'chill pick your teeth zir come no matter vor your foins'
 'thanks dear isabel' 'come prick me bullcalf till he roar again'
 'go some of you knock at the abbeygate and bid the lady abbess come to me'
 'an twere not as good deed as drink to break the pate on thee i am a very villain'
 'beaufort it is thy sovereign speaks to thee'
 'but say lucetta now we are alone wouldst thou then counsel me to fall in love'
 'for being a bawd for being a bawd'
 'all blest secrets all you unpublishd virtues of the earth spring with my tears'
 'what likelihood' 'o find him']

max_len = 100

# Tokenize
train_data_flattened = " ".join(train_data).split()
sequences = list() 
for i in range(max_len+1, len(train_data_flattened)):
    seq = train_data_flattened[i-max_len-1:i]
    sequences.append(seq)

# Encode
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sequences)
vocab_size = len(tokenizer.word_index)
encoded_sequences = np.array(tokenizer.texts_to_sequences(sequences))
        
X = encoded_sequences[:,:-1]
y = encoded_sequences[:,-1]

def create_nn(input_shape=(100,1), output_shape=None):

    model = Sequential()
    model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(Flatten())
    model.add(Dense(output_shape, activation='softmax'))
    
    metrics_list = [
        tf.keras.metrics.AUC(name='auc'),
        # tf.keras.metrics.BinaryAccuracy(name='accuracy'),
        tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall'),
    ]

    sparse_cat_crossentropy = tf.losses.SparseCategoricalCrossentropy(from_logits=False)

    model.compile(optimizer = 'adam', loss = sparse_cat_crossentropy, metrics = metrics_list)
    return model

model = create_nn(output_shape=vocab_size)
model.fit(X, y)

Solution

The error is actually coming from the metrics you are using. I do not think that it makes much sense to use AUC, Precision, and Recall metrics when using the loss function SparseCategoricalCrossentropy in your case. Here is a working example:

import tensorflow as tf

def create_nn(input_shape=(100,1), output_shape=None):

    model = tf.keras.Sequential()
    model.add(tf.keras.layers.LSTM(64, input_shape=input_shape, return_sequences=False))
    model.add(tf.keras.layers.Dropout(0.3))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(output_shape, activation='softmax'))
    
    metrics_list = [
        tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
    ]

    sparse_cat_crossentropy = tf.losses.SparseCategoricalCrossentropy(from_logits=False)

    model.compile(optimizer = 'adam', loss = sparse_cat_crossentropy, metrics = metrics_list)
    return model

vocab_size = 129
model = create_nn(output_shape=vocab_size)

X = tf.random.uniform((500, 100, 1), maxval=vocab_size, dtype=tf.int32)
y = tf.random.uniform((500, 1), maxval=vocab_size, dtype=tf.int32)
model.fit(X, y, batch_size=64, epochs=2)

Epoch 1/2
8/8 [==============================] - 4s 23ms/step - loss: 4.9432 - accuracy: 0.0100
Epoch 2/2
8/8 [==============================] - 0s 22ms/step - loss: 4.9149 - accuracy: 0.0100
<keras.callbacks.History at 0x7fd59abb5510>