Search code examples
pythonpython-3.xtensorflowmachine-learningtensorflow2.0

Why do I get ValueError: Unrecognized data type: x=[...] (of type <class 'list'>) with model.fit() in TensorFlow?


I tried to run the code below, taken from CS50's AI course:

import csv
import tensorflow as tf
from sklearn.model_selection import train_test_split

# Read data in from file
with open("banknotes.csv") as f:
    reader = csv.reader(f)
    next(reader)

    data = []
    for row in reader:
        data.append(
            {
                "evidence": [float(cell) for cell in row[:4]],
                "label": 1 if row[4] == "0" else 0,
            }
        )

# Separate data into training and testing groups
evidence = [row["evidence"] for row in data]
labels = [row["label"] for row in data]
X_training, X_testing, y_training, y_testing = train_test_split(
    evidence, labels, test_size=0.4
)

# Create a neural network
model = tf.keras.models.Sequential()

# Add a hidden layer with 8 units, with ReLU activation
model.add(tf.keras.layers.Dense(8, input_shape=(4,), activation="relu"))

# Add output layer with 1 unit, with sigmoid activation
model.add(tf.keras.layers.Dense(1, activation="sigmoid"))

# Train neural network
model.compile(
    optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]
)
model.fit(X_training, y_training, epochs=20)

# Evaluate how well model performs
model.evaluate(X_testing, y_testing, verbose=2)

However, I get the following error:

Traceback (most recent call last):
  File "C:\Users\Eric\Desktop\coding\cs50\ai\lectures\lecture5\banknotes\banknotes.py", line 41, in <module>
    model.fit(X_training, y_training, epochs=20)
  File "C:\Users\Eric\Desktop\coding\cs50\ai\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\Eric\Desktop\coding\cs50\ai\.venv\Lib\site-packages\keras\src\trainers\data_adapters\__init__.py", line 113, in get_data_adapter
    raise ValueError(f"Unrecognized data type: x={x} (of type {type(x)})")
ValueError: Unrecognized data type: x=[...] (of type <class 'list'>)

where "..." is the training data.

Any idea what went wrong? I'm using Python version 3.11.8 and TensorFlow version 2.16.1 on a Windows computer.

I tried running the same code in a Google Colab notebook, and it works: the problem only occurs on my local machine. This is the output I'm expecting:

Epoch 1/20
26/26 [==============================] - 1s 2ms/step - loss: 1.1008 - accuracy: 0.5055
Epoch 2/20
26/26 [==============================] - 0s 2ms/step - loss: 0.8588 - accuracy: 0.5334
Epoch 3/20
26/26 [==============================] - 0s 2ms/step - loss: 0.6946 - accuracy: 0.5917
Epoch 4/20
26/26 [==============================] - 0s 2ms/step - loss: 0.5970 - accuracy: 0.6683
Epoch 5/20
26/26 [==============================] - 0s 2ms/step - loss: 0.5265 - accuracy: 0.7120
Epoch 6/20
26/26 [==============================] - 0s 2ms/step - loss: 0.4717 - accuracy: 0.7655
Epoch 7/20
26/26 [==============================] - 0s 2ms/step - loss: 0.4258 - accuracy: 0.8177
Epoch 8/20
26/26 [==============================] - 0s 2ms/step - loss: 0.3861 - accuracy: 0.8433
Epoch 9/20
26/26 [==============================] - 0s 2ms/step - loss: 0.3521 - accuracy: 0.8615
Epoch 10/20
26/26 [==============================] - 0s 2ms/step - loss: 0.3226 - accuracy: 0.8870
Epoch 11/20
26/26 [==============================] - 0s 2ms/step - loss: 0.2960 - accuracy: 0.9028
Epoch 12/20
26/26 [==============================] - 0s 2ms/step - loss: 0.2722 - accuracy: 0.9125
Epoch 13/20
26/26 [==============================] - 0s 2ms/step - loss: 0.2506 - accuracy: 0.9283
Epoch 14/20
26/26 [==============================] - 0s 2ms/step - loss: 0.2306 - accuracy: 0.9514
Epoch 15/20
26/26 [==============================] - 0s 3ms/step - loss: 0.2124 - accuracy: 0.9660
Epoch 16/20
26/26 [==============================] - 0s 2ms/step - loss: 0.1961 - accuracy: 0.9769
Epoch 17/20
26/26 [==============================] - 0s 2ms/step - loss: 0.1813 - accuracy: 0.9781
Epoch 18/20
26/26 [==============================] - 0s 2ms/step - loss: 0.1681 - accuracy: 0.9793
Epoch 19/20
26/26 [==============================] - 0s 2ms/step - loss: 0.1562 - accuracy: 0.9793
Epoch 20/20
26/26 [==============================] - 0s 2ms/step - loss: 0.1452 - accuracy: 0.9830
18/18 - 0s - loss: 0.1407 - accuracy: 0.9891 - 187ms/epoch - 10ms/step
[0.14066053926944733, 0.9890710115432739]

Solution

  • https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

    It appears you're giving Model.fit([X], [y]) the wrong type.

    What I almost always do before handing off data to train_test_split is converting my features and labels to np arrays.

    So you can either convert them before handing them off to train_test_split or do it before the model.fit(...)

    NOTE: Don't forget to add import numpy as np

    So in your case you'd do:

    X_training_np = np.array(X_training)
    y_training_np = np.array(y_training)
    
    model.fit(X_training_np, y_training_np, epochs=...)