i am pretty new to ML and trying to do an typical fashion_mnist Classification. The Problem is that the accuracy Score after I run the code is only 0.1 and the loss is below 0. So i guess the ML is not learning but I dont know what the Problem is? Thx
from tensorflow.keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.astype('float32')
print(type(x_train))
x_train =x_train.reshape(60000,784)
x_train = x_train / 255.0
x_test =x_test.reshape(60000,784)
x_test= x_test/ 255.0
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
model= Sequential()
model.add(Dense(100, activation="sigmoid", input_shape=(784,)))
model.add(Dense(1, activation="sigmoid"))
model.compile(optimizer='sgd', loss="binary_crossentropy", metrics=["accuracy"])
model.fit(
x_train,
y_train,
epochs=10,
batch_size=1000)
Output:
Multiple issues with your code -
x_test = x_test.reshape(10000,784)
as it has 10000 images only.sigmoid
activation in the first dense layer, which is not a good practice. Instead, use relu
.0-10
, a neural network can't predict integer values with a softmax
or sigmoid
activation. Instead what you are trying to do is predict the probability values for EACH of the 10 classes.softmax
.categorical_crossentropy
. Since your output is a 10-dimensional probability distribution, but your y_train is a single value for each class label, you can use sparse_categorical_crossentropy
instead which is the same thing but handles label encoded y.adam
.(28,28)
and this is a toy example, it's ok the way it is.Please refer to this table for checking out what to use. You have to ensure you know what problem you are solving in the first place though.
In your case, you want to do a multi-class single label classification but you are instead doing a multi-class multi-label classification by using the incorrect loss and output layer activation.
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
#Load data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
#Normalize
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
#Reshape
x_train = x_train.reshape(60000,784)
x_train = x_train / 255.0
x_test = x_test.reshape(10000,784)
x_test = x_test / 255.0
print('Data shapes->',[i.shape for i in [x_train, y_train, x_test, y_test]])
#Contruct computation graph
model = Sequential()
model.add(Dense(100, activation="relu", input_shape=(784,)))
model.add(Dense(10, activation="softmax"))
#Compile with loss as cross_entropy and optimizer as adam
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])
#Fit model
model.fit(x_train, y_train, epochs=10, batch_size=1000)
Data shapes-> [(60000, 784), (60000,), (10000, 784), (10000,)]
Epoch 1/10
60/60 [==============================] - 0s 5ms/step - loss: 0.8832 - accuracy: 0.7118
Epoch 2/10
60/60 [==============================] - 0s 6ms/step - loss: 0.5125 - accuracy: 0.8281
Epoch 3/10
60/60 [==============================] - 0s 6ms/step - loss: 0.4585 - accuracy: 0.8425
Epoch 4/10
60/60 [==============================] - 0s 6ms/step - loss: 0.4238 - accuracy: 0.8547
Epoch 5/10
60/60 [==============================] - 0s 7ms/step - loss: 0.4038 - accuracy: 0.8608
Epoch 6/10
60/60 [==============================] - 0s 6ms/step - loss: 0.3886 - accuracy: 0.8656
Epoch 7/10
60/60 [==============================] - 0s 6ms/step - loss: 0.3788 - accuracy: 0.8689
Epoch 8/10
60/60 [==============================] - 0s 6ms/step - loss: 0.3669 - accuracy: 0.8725
Epoch 9/10
60/60 [==============================] - 0s 6ms/step - loss: 0.3560 - accuracy: 0.8753
Epoch 10/10
60/60 [==============================] - 0s 6ms/step - loss: 0.3451 - accuracy: 0.8794
I am also adding a code for your reference with Convolutional layers
, using categorical_crossentropy
and functional API
instead of Sequential. Please read the comments inline the code for more clarity. This should help you get an idea of some good practices when working with Keras.
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras import layers, Model, utils
#Load data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
#Normalize
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
#Reshape
x_train = x_train.reshape(60000,28,28,1)
x_train = x_train / 255.0
x_test = x_test.reshape(10000,28,28,1)
x_test = x_test / 255.0
#Set y to onehot instead of label encoded
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
#print([i.shape for i in [x_train, y_train, x_test, y_test]])
#Contruct computation graph
inp = layers.Input((28,28,1))
x = layers.Conv2D(32, (3,3), activation='relu', padding='same')(inp)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(32, (3,3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Flatten()(x)
out = Dense(10, activation='softmax')(x)
#Define model
model = Model(inp, out)
#Compile with loss as cross_entropy and optimizer as adam
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["accuracy"])
#Fit model
model.fit(x_train, y_train, epochs=10, batch_size=1000)
utils.plot_model(model, show_layer_names=False, show_shapes=True)