I am trying to build a simple CNN model for binary classification but the training dataset consists of over 100k of '.png' file. If I train the model by loading all the data at once, it will create a MemoryExhaustion Error. Can somebody help me to build the network to deal with such huge dataset?
You can stream with yield
statement.
def load_at_once(image_names):
return [load(image_name) for image_name in image_names] # memory exhaust
def load_stream(image_names):
for image_name in image_names:
yield load(image_name)
You can iterate images with for
statement. load_stream
function will load image one by one and prevent memory exhaust if you don't try saving all images in memory.
Of course streaming is slower than loading everything to memory when you use images more than one time, because it will read image every time you want to use.