Search code examples
pythontensorflowkerastext-classificationmulticlass-classification

Getting a ValueError: Shapes (None, 1) and (None, 5) are incompatible


X_train = df_train["Base_Reviews"].values
X_test  = df_test["Base_Reviews"].values

y_train = df_train['category'].values
y_test  = df_test['category'].values

num_words = 20000 #Max. workds to use per toxic comment
max_features = 15000 #Max. number of unique words in embeddinbg vector
max_len = 200 #Max. number of words per toxic comment to be use
embedding_dims = 128 #embedding vector output dimension 
num_epochs = 5 # (before 5)number of epochs (number of times that the model is exposed to the training dataset)
val_split = 0.2
batch_size2 = 256

tokenizer = tokenizer = Tokenizer(num_words = num_words, lower = False)
tokenizer.fit_on_texts(list(X_train))


X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
X_train = sequence.pad_sequences(X_train, max_len)
X_test  = sequence.pad_sequences(X_test,  max_len)
print('X_train shape:', X_train.shape)
print('X_test shape: ', X_test.shape)

and this is the shape of our dataset: X_train shape: (11419, 200), X_test shape: (893, 200)

X_tra, X_val, y_tra, y_val = train_test_split(X_train, y_train, train_size =0.8, random_state=233)
early = EarlyStopping(monitor="val_loss", mode="min", patience=4)

nn_model = Sequential([
    Embedding(input_dim=max_features, input_length=max_len, output_dim=embedding_dims),
    GlobalMaxPool1D(),
    Dense(50, activation = 'relu'),
    Dropout(0.2),
    Dense(5, activation = 'softmax')
])

def mean_pred(y_true, y_pred):
return K.mean(y_pred)
nn_model.compile(loss="categorical_crossentropy", optimizer=Adam(0.01), metrics=['accuracy', mean_pred, fmeasure, precision, auroc, recall])

When I run the below code I get the above error.

nn_model.compile(loss="categorical_crossentropy", optimizer=Adam(0.01), metrics=['accuracy', mean_pred, fmeasure, precision, auroc, recall])

When I feed the data to NN Model the above error I get. How can I resolve the error? This is the error:

ValueError                               


Traceback (most recent call last)
<ipython-input-51-a3721a91aa0b> in <module>
----> 1 nn_model_fit = nn_model.fit(X_tra, y_tra, batch_size=batch_size2, epochs=num_epochs, validation_data=(X_val, y_val), callbacks=[early])

~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

~\anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in autograph_handler(*args, **kwargs)
   1145           except Exception as e:  # pylint:disable=broad-except
   1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
   1148             else:
   1149               raise

ValueError: in user code:
**ValueError: Shapes (None, 1) and (None, 5) are incompatible**

Solution

  • You have to map your labels to integer values:

    import numpy as np
    
    labels_index = dict(zip(["issue", "supporting", "decision", "neutral", "attacking"], np.arange(5)))
    
    y_train = [labels_index[y] for y in y_train]