Search code examples
pythonimageopencvmnistimage-preprocessing

Converting image into MNIST format


I have trained a KNN model to predict handwritten images in the MNIST dataset. I want to test it on my own handwriting now. I want to convert it into the MNIST format (values for 784 pixels in the image as an array). I tried converting the image into a 28*28 pixels and storing the pixel intensities in the code below:

img = cv2.imread(image,cv.IMREAD_GRAYSCALE)
resized = cv2.resize(img, (28,28)) 
features = resized.reshape(1,-1)

But for some reason it is not working and the predictions are always '0'. I think I will need to shift my own image to the centre of 28x28 image and grayscale it. I also think my image is not being converted into the array properly. Could somebody tell me how I can properly format my images to make the predictions more accurate?


Solution

  • Following code should work as it is, or maybe you may have to tweak it a little bit. Function names are self explanatory.

    import numpy as np
    import cv2
    
    def resize_to_28x28(img):
        img_h, img_w = img.shape
        dim_size_max = max(img.shape)
    
        if dim_size_max == img_w:
            im_h = (26 * img_h) // img_w
            if im_h <= 0 or img_w <= 0:
                print("Invalid Image Dimention: ", im_h, img_w, img_h)
            tmp_img = cv2.resize(img, (26,im_h),0,0,cv2.INTER_NEAREST)
        else:
            im_w = (26 * img_w) // img_h
            if im_w <= 0 or img_h <= 0:
                print("Invalid Image Dimention: ", im_w, img_w, img_h)
            tmp_img = cv2.resize(img, (im_w, 26),0,0,cv2.INTER_NEAREST)
    
        out_img = np.zeros((28, 28), dtype=np.ubyte)
    
        nb_h, nb_w = out_img.shape
        na_h, na_w = tmp_img.shape
        y_min = (nb_w) // 2 - (na_w // 2)
        y_max = y_min + na_w
        x_min = (nb_h) // 2 - (na_h // 2)
        x_max = x_min + na_h
    
        out_img[x_min:x_max, y_min:y_max] = tmp_img
    
        return out_img
    
    def run_inference(img):
        tsr_img = resize_to_28x28(img)
        input_data = np.copy(tsr_img).reshape(1,28,28,1)
    
        if floating_model:
            input_data = (np.float32(input_data) - input_mean) / input_std
    
        # Instantiate tensorflow interpreter and run inference on input_data