Search code examples
pythondeep-learningbinarydatasetconv-neural-network

Simple CNN Binary Classification Network with dataset consisting of more than 100000 image files


I am trying to build a simple CNN model for binary classification but the training dataset consists of over 100k of '.png' file. If I train the model by loading all the data at once, it will create a MemoryExhaustion Error. Can somebody help me to build the network to deal with such huge dataset?


Solution

  • You can stream with yield statement.

    def load_at_once(image_names):
        return [load(image_name) for image_name in image_names] # memory exhaust
    
    def load_stream(image_names):
        for image_name in image_names:
            yield load(image_name)
    

    You can iterate images with for statement. load_stream function will load image one by one and prevent memory exhaust if you don't try saving all images in memory.

    Of course streaming is slower than loading everything to memory when you use images more than one time, because it will read image every time you want to use.