I found plenty of seemingly related Stackoverflow posts with the same error message when fitting neural network models on data but none of them seemed to relate directly to my use case, i.e. fitting using the sparse_categorical_crossentropy
loss function. I'm aware I could alternatively use the caterorical_crossentropy
by first encoding the target variables to one-hot form using to_categorical()
but due to the large number of target classes, I would run into memory issues with that approach, hence a sparse method is the only reasonable workaround.
Below I included a sample data and a full reproducible example. Error pops up in the model.fit(X,y)
line and error message is as follows:
ValueError: in user code:
File "...\.venv\lib\site-packages\keras\engine\training.py", line 1021, in train_function *
return step_function(self, iterator)
File "...\.venv\lib\site-packages\keras\engine\training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "...\.venv\lib\site-packages\keras\engine\training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "...\.venv\lib\site-packages\keras\engine\training.py", line 864, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "...\.venv\lib\site-packages\keras\engine\training.py", line 957, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "...\.venv\lib\site-packages\keras\engine\compile_utils.py", line 459, in update_state
metric_obj.update_state(y_t, y_p, sample_weight=mask)
File "...\.venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated
update_op = update_state_fn(*args, **kwargs)
File "...\.venv\lib\site-packages\keras\metrics.py", line 178, in update_state_fn
return ag_update_state(*args, **kwargs)
File "...\.venv\lib\site-packages\keras\metrics.py", line 2364, in update_state **
label_weights=label_weights)
File "...\.venv\lib\site-packages\keras\utils\metrics_utils.py", line 619, in update_confusion_matrix_variables
y_pred.shape.assert_is_compatible_with(y_true.shape)
ValueError: Shapes (32, 129) and (32, 1) are incompatible
Full code:
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Flatten
from keras.preprocessing.text import Tokenizer
train_data = ['o by no means honest ventidius i gave it freely ever and theres none can truly say he gives if our betters play at that game we must not dare to imitate them faults that are rich are fair'
'but was not this nigh shore'
'impairing henry strengthening misproud york the common people swarm like summer flies and whither fly the gnats but to the sun'
'what while you were there'
'chill pick your teeth zir come no matter vor your foins'
'thanks dear isabel' 'come prick me bullcalf till he roar again'
'go some of you knock at the abbeygate and bid the lady abbess come to me'
'an twere not as good deed as drink to break the pate on thee i am a very villain'
'beaufort it is thy sovereign speaks to thee'
'but say lucetta now we are alone wouldst thou then counsel me to fall in love'
'for being a bawd for being a bawd'
'all blest secrets all you unpublishd virtues of the earth spring with my tears'
'what likelihood' 'o find him']
max_len = 100
# Tokenize
train_data_flattened = " ".join(train_data).split()
sequences = list()
for i in range(max_len+1, len(train_data_flattened)):
seq = train_data_flattened[i-max_len-1:i]
sequences.append(seq)
# Encode
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sequences)
vocab_size = len(tokenizer.word_index)
encoded_sequences = np.array(tokenizer.texts_to_sequences(sequences))
X = encoded_sequences[:,:-1]
y = encoded_sequences[:,-1]
def create_nn(input_shape=(100,1), output_shape=None):
model = Sequential()
model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(output_shape, activation='softmax'))
metrics_list = [
tf.keras.metrics.AUC(name='auc'),
# tf.keras.metrics.BinaryAccuracy(name='accuracy'),
tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
]
sparse_cat_crossentropy = tf.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer = 'adam', loss = sparse_cat_crossentropy, metrics = metrics_list)
return model
model = create_nn(output_shape=vocab_size)
model.fit(X, y)
The error is actually coming from the metrics you are using. I do not think that it makes much sense to use AUC
, Precision
, and Recall
metrics when using the loss function SparseCategoricalCrossentropy
in your case. Here is a working example:
import tensorflow as tf
def create_nn(input_shape=(100,1), output_shape=None):
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(64, input_shape=input_shape, return_sequences=False))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(output_shape, activation='softmax'))
metrics_list = [
tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
]
sparse_cat_crossentropy = tf.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer = 'adam', loss = sparse_cat_crossentropy, metrics = metrics_list)
return model
vocab_size = 129
model = create_nn(output_shape=vocab_size)
X = tf.random.uniform((500, 100, 1), maxval=vocab_size, dtype=tf.int32)
y = tf.random.uniform((500, 1), maxval=vocab_size, dtype=tf.int32)
model.fit(X, y, batch_size=64, epochs=2)
Epoch 1/2
8/8 [==============================] - 4s 23ms/step - loss: 4.9432 - accuracy: 0.0100
Epoch 2/2
8/8 [==============================] - 0s 22ms/step - loss: 4.9149 - accuracy: 0.0100
<keras.callbacks.History at 0x7fd59abb5510>