Search code examples
pythonneural-networkjupyter-notebookmnist

fashion_mnist Data ML Accuracy Score is only 0.1


i am pretty new to ML and trying to do an typical fashion_mnist Classification. The Problem is that the accuracy Score after I run the code is only 0.1 and the loss is below 0. So i guess the ML is not learning but I dont know what the Problem is? Thx

from tensorflow.keras.datasets import fashion_mnist 
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

x_train = x_train.astype('float32')
print(type(x_train))
x_train =x_train.reshape(60000,784)
x_train = x_train / 255.0
x_test =x_test.reshape(60000,784)
x_test= x_test/ 255.0


from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

model= Sequential()
model.add(Dense(100, activation="sigmoid", input_shape=(784,)))
model.add(Dense(1, activation="sigmoid"))
model.compile(optimizer='sgd', loss="binary_crossentropy", metrics=["accuracy"])

model.fit(
    x_train,
    y_train,
    epochs=10,
    batch_size=1000)

Output:

enter image description here


Solution

  • Multiple issues with your code -

    1. You have some error in the reshape x_test = x_test.reshape(10000,784) as it has 10000 images only.
    2. You are using a sigmoid activation in the first dense layer, which is not a good practice. Instead, use relu.
    3. Your output Dense has only 1 node. You are working with a dataset that has 10 unique classes. The output has to be Dense(10). Please understand that even though the y_train has classes 0-10, a neural network can't predict integer values with a softmax or sigmoid activation. Instead what you are trying to do is predict the probability values for EACH of the 10 classes.
    4. You are using the incorrect activation in the final layer for multi-class classification. Use softmax.
    5. You are using the incorrect loss function. For multi-class classification use categorical_crossentropy. Since your output is a 10-dimensional probability distribution, but your y_train is a single value for each class label, you can use sparse_categorical_crossentropy instead which is the same thing but handles label encoded y.
    6. Try using a better optimizer to avoid getting stuck in local minima, such as adam.
    7. It's preferred to use CNNs for image data since a simple Dense layer will not be able to capture the spatial features that make up the image. Since the images are small (28,28) and this is a toy example, it's ok the way it is.

    Please refer to this table for checking out what to use. You have to ensure you know what problem you are solving in the first place though.

    enter image description here

    In your case, you want to do a multi-class single label classification but you are instead doing a multi-class multi-label classification by using the incorrect loss and output layer activation.

    from tensorflow.keras.datasets import fashion_mnist 
    from tensorflow.keras import Sequential
    from tensorflow.keras.layers import Dense
    
    #Load data
    (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
    
    #Normalize
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    
    #Reshape
    x_train = x_train.reshape(60000,784)
    x_train = x_train / 255.0
    x_test = x_test.reshape(10000,784)
    x_test = x_test / 255.0
    
    print('Data shapes->',[i.shape for i in [x_train, y_train, x_test, y_test]])
    
    #Contruct computation graph
    model = Sequential()
    model.add(Dense(100, activation="relu", input_shape=(784,)))
    model.add(Dense(10, activation="softmax"))
    
    #Compile with loss as cross_entropy and optimizer as adam
    model.compile(optimizer='adam', loss="sparse_categorical_crossentropy", metrics=["accuracy"])
    
    #Fit model
    model.fit(x_train, y_train, epochs=10, batch_size=1000)
    
    Data shapes-> [(60000, 784), (60000,), (10000, 784), (10000,)]
    Epoch 1/10
    60/60 [==============================] - 0s 5ms/step - loss: 0.8832 - accuracy: 0.7118
    Epoch 2/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.5125 - accuracy: 0.8281
    Epoch 3/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.4585 - accuracy: 0.8425
    Epoch 4/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.4238 - accuracy: 0.8547
    Epoch 5/10
    60/60 [==============================] - 0s 7ms/step - loss: 0.4038 - accuracy: 0.8608
    Epoch 6/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.3886 - accuracy: 0.8656
    Epoch 7/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.3788 - accuracy: 0.8689
    Epoch 8/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.3669 - accuracy: 0.8725
    Epoch 9/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.3560 - accuracy: 0.8753
    Epoch 10/10
    60/60 [==============================] - 0s 6ms/step - loss: 0.3451 - accuracy: 0.8794
    

    I am also adding a code for your reference with Convolutional layers, using categorical_crossentropy and functional API instead of Sequential. Please read the comments inline the code for more clarity. This should help you get an idea of some good practices when working with Keras.

    from tensorflow.keras.datasets import fashion_mnist 
    from tensorflow.keras import layers, Model, utils
    
    #Load data
    (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
    
    #Normalize
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    
    #Reshape
    x_train = x_train.reshape(60000,28,28,1)
    x_train = x_train / 255.0
    x_test = x_test.reshape(10000,28,28,1)
    x_test = x_test / 255.0
    
    #Set y to onehot instead of label encoded
    y_train = utils.to_categorical(y_train)
    y_test = utils.to_categorical(y_test)
    
    #print([i.shape for i in [x_train, y_train, x_test, y_test]])
    
    #Contruct computation graph
    inp = layers.Input((28,28,1))
    x = layers.Conv2D(32, (3,3), activation='relu', padding='same')(inp)
    x = layers.MaxPooling2D((2,2))(x)
    x = layers.Conv2D(32, (3,3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2,2))(x)
    x = layers.Flatten()(x)
    out = Dense(10, activation='softmax')(x)
    
    #Define model
    model = Model(inp, out)
    
    #Compile with loss as cross_entropy and optimizer as adam
    model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["accuracy"])
    
    #Fit model
    model.fit(x_train, y_train, epochs=10, batch_size=1000)
    
    utils.plot_model(model, show_layer_names=False, show_shapes=True)
    

    enter image description here