tensorflow machine-learning dataset classification tensorflow-datasets

How do I build my own handwritten digits dataset

I have a set of images of numbers that go from 0 to 20 with intermediate classes(0,25 / 0,5 / 0,75). Each number will be defined as a class of its own. I have 22 images per class.

These images will be used for training and testing on a convolutional neural network for classification. I'm not worried about accuracy, it's only a proof of concept so I realise that the dataset is too small for any real reliable outcome. Like I said, it's only meant as a proof of concept.

EDIT: As suggested by @Kaveh, I checked out ImageDataGenerator.flow_from_directory

As far as I could tell, this is used to increase your dataset size using data augmentation. However, what I'm asking is, now that I have these images set in different folders (22 images per folder, each folder making a class) how do I use them. I've always been loading one file that makes up the dataset (example: mnist; through keras). I've never used my own data and therefore, have no idea what the next step is.

Solution

organize your directories as shown below

data_dir
-----train_dir
---------zero_dir
-------------first_zero_image.jpg
-------------sencond_zero_image,jpg
...
-------------twenty_second_zero_image.jpg
---------ones_dir
-------------first_ones_image.jpg
-------------second_one_image.jpg
...
-------------twenty_second_ones_image.jpg
......
         twenty_dir
-------------first_20_image.jpg
-------------seccond_20_image.jpg
...
-------------twenty_second_20_image.jpg
-----test_dir
--------zeros_dir
#  structure the test directory like the train directory and put
# your test images in it

Now you can use the Keras ImageDataGenerator.flow_from_directory to provide the data for model.fit.

train_path=os.path.join(data_dir, train_dir)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
gen=ImageDataGenerator(rescale=1/255, validation_split=.2)
train_gen=gen.flow_from_directory( train_path,
                                   target_size=(256, 256),
                                   color_mode="rgb",
                                   classes=None,
                                   class_mode="categorical",
                                   batch_size=32,
                                   shuffle=True,
                                   seed=123,    
                                   subset='training' )  
valid_gen=gen.flow_from_directory( train_path,
                                   target_size=(256, 256),
                                   color_mode="rgb",
                                   classes=None,
                                   class_mode="categorical",
                                   batch_size=32,                                   
                                   shuffle=False, 
                                   subset='validation' )
history=model.fit(train_gen, epochs=20, validation_data=valid_gen)

That should do it