I have my pre-processed image data in numpy array, and my script works fine with a single GPU by feeding numpy array. From what I understood, we need to create MinibatchSource for multiple GPU training. I'm checking this example (ConvNet_CIFAR10_DataAug_Distributed.py) for distributed training, however it uses *_map.txt
which is basically a list of paths to image file (ex. png). I'm wondering what the best way is to create MinibatchSource from numpy array, instead of converting numpy array back to png files.
You can create composite readers that combine multiple image deserializers into one source. First you need to create two map files (with dummy labels). One will contain all input images and the other will contain the corresponding target images. The following code is a minimal implementation, assuming the files are called map1.txt
and map2.txt
import numpy as np
import cntk as C
import cntk.io.transforms as xforms
import sys
def create_reader(map_file1, map_file2):
transforms = [xforms.scale(width=224, height=224, channels=3, interpolations='linear')]
source1 = C.io.ImageDeserializer(map_file1, C.io.StreamDefs(
source_image = C.io.StreamDef(field='image', transforms=transforms)))
source2 = C.io.ImageDeserializer(map_file2, C.io.StreamDefs(
target_image = C.io.StreamDef(field='image', transforms=transforms)))
return C.io.MinibatchSource([source1, source2], max_samples=sys.maxsize, randomize=True)
x = C.input_variable((3,224,224))
y = C.input_variable((3,224,224))
# world's simplest model
model = C.layers.Convolution((3,3),3, pad=True)
z = model(x)
loss = C.squared_error(z, y)
reader = create_reader("map1.txt", "map2.txt")
trainer = C.Trainer(z, loss, C.sgd(z.parameters, C.learning_rate_schedule(.00001, C.UnitType.minibatch)))
minibatch_size = 2
input_map={
x: reader.streams.source_image,
y: reader.streams.target_image
}
for i in range(30):
data=reader.next_minibatch(minibatch_size, input_map=input_map)
print(data)
trainer.train_minibatch(data)