Caffe's transformer.preprocessing takes too long to complete

I wrote a simple script to test a model using PyCaffe, but I noticed it is extremely slow! even on GPU! My test set has 82K samples of size 256x256 and when I ran the code which is given below, it takes hours to complete.

I even used batches of images instead of individual ones, yet nothing changes. Currently, it has been running for the past 5 hours, and only 50K samples are processed! What should I do to make it faster?

Can I completely avoid using transformer.preprocessing? if so how?

Here is the snippet:

#run on gpu
caffe.set_mode_gpu()

#Extract mean from the mean image file
mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
f = open(args.mean, 'rb')
mean_blobproto_new.ParseFromString(f.read())
mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
f.close()

predicted_lables = []
true_labels = []
misclassified =[]
class_names = ['unsafe','safe']
count = 0
correct = 0
batch=[]
plabe_ls = []
batch_size = 50

net1 = caffe.Net(args.proto, args.model, caffe.TEST) 
transformer = caffe.io.Transformer({'data': net1.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  
transformer.set_mean('data', mean_image[0].mean(1).mean(1))
transformer.set_raw_scale('data', 255)      
transformer.set_channel_swap('data', (2,1,0)) 
net1.blobs['data'].reshape(batch_size, 3,224, 224)
data_blob_shape = net1.blobs['data'].data.shape
data_blob_shape = list(data_blob_shape)
i=0

mu = np.array([ 104,  117,  123])#imagenet mean

#check and see if its lmdb or leveldb
if(args.db_type.lower() == 'lmdb'):
    lmdb_env = lmdb.open(args.db_path)
    lmdb_txn = lmdb_env.begin()
    lmdb_cursor = lmdb_txn.cursor()
    for key, value in lmdb_cursor:
        count += 1 
        datum = caffe.proto.caffe_pb2.Datum()
        datum.ParseFromString(value)
        label = int(datum.label)
        image = caffe.io.datum_to_array(datum).astype(np.uint8)
        if(count % 5000 == 0):
            print('count: ',count)
        if(i < batch_size):
            i+=1
            inf= key,image,label
            batch.append(inf)
        if(i >= batch_size):
            #process n image 
            ims=[]
            for x in range(len(batch)):
                ims.append(transformer.preprocess('data',batch[x][1]))# - mean_image[0].mean(1).mean(1) )
            net1.blobs['data'].data[...] = ims[:]
            out_1 = net1.forward()
            plbl = np.asarray( out_1['pred'])   
            plbl = plbl.argmax(axis=1)
            for j in range(len(batch)):
                if (plbl[j] == batch[j][2]):
                    correct+=1
                else:
                    misclassified.append(batch[j][0])

                predicted_lables.append(plbl[j])
                true_labels.append(batch[j][2]) 
            batch.clear()
            i=0

Update:

By replacing

for x in range(len(batch)):
    ims.append(transformer.preprocess('data',batch[x][1]))
    net1.blobs['data'].data[...] = ims[:]

with

for x in range(len(batch)):
   img = batch[x][1]
   ims.append(img[:,0:224,0:224])

82K samples were processed in less than a minute. The culprit is indeed the preprocess method and I have no idea why it acts like this!

Anyway, I can't use mean file this way. I tried to do

ims.append(img[:,0:224,0:224] - mean.mean(1).mean(1))

as well but faced with this error:

ValueError: operands could not be broadcast together with shapes (3,224,224) (3,)

I also need to find a better way for cropping the image, I don't know if I need to resize it back to 224? or I should use crops just like caffe?

Solution

I finally made it! here is the code that runs much faster:

predicted_lables=[]
true_labels = []
misclassified =[]
class_names = ['unsafe','safe']
count =0
correct = 0
batch = []
plabe_ls = []
batch_size = 50
cropx = 224
cropy = 224
i = 0

# Extract mean from the mean image file
mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
f = open(args.mean, 'rb')
mean_blobproto_new.ParseFromString(f.read())
mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
f.close()

caffe.set_mode_gpu() 
net1 = caffe.Net(args.proto, args.model, caffe.TEST) 
net1.blobs['data'].reshape(batch_size, 3, 224, 224)
data_blob_shape = net1.blobs['data'].data.shape

#check and see if its lmdb or leveldb
if(args.db_type.lower() == 'lmdb'):
    lmdb_env = lmdb.open(args.db_path)
    lmdb_txn = lmdb_env.begin()
    lmdb_cursor = lmdb_txn.cursor()
    for key, value in lmdb_cursor:
        count += 1 
        datum = caffe.proto.caffe_pb2.Datum()
        datum.ParseFromString(value)
        label = int(datum.label)
        image = caffe.io.datum_to_array(datum).astype(np.float32)
        #key,image,label
        #buffer n image
        if(count % 5000 == 0):          
            print('{0} samples processed so far'.format(count))
        if(i < batch_size):
            i += 1
            inf= key,image,label
            batch.append(inf)
            #print(key)                 
        if(i >= batch_size):
            #process n image 
            ims=[]              
            for x in range(len(batch)):
                img = batch[x][1]
                #img has c,h,w shape! its already gone through transpose
                #and channel swap when it was being saved into lmdb!
                #method I: crop the both the image and mean file 
                #ims.append(img[:,0:224,0:224] - mean_image[0][:,0:224,0:224] )
                #Method II : resize the image to the desired size(crop size) 
                #img = caffe.io.resize_image(img.transpose(2,1,0), (224, 224))
                #Method III : use center crop just like caffe does in test time
                #center crop
                c,w,h = img.shape
                startx = h//2 - cropx//2
                starty = w//2 - cropy//2
                img = img[:, startx:startx + cropx, starty:starty + cropy]                  
                #transpose the image so we can subtract from mean
                img = img.transpose(2,1,0)
                img -= mean_image[0].mean(1).mean(1)
                #transpose back to the original state
                img = img.transpose(2,1,0)
                ims.append(img)        
                                                                                                                            
            net1.blobs['data'].data[...] = ims[:]
            out_1 = net1.forward()
            plabe_ls = out_1['pred']
            plbl = np.asarray(plabe_ls)
            plbl = plbl.argmax(axis=1)
            for j in range(len(batch)):
                if (plbl[j] == batch[j][2]):
                    correct += 1
                else:
                    misclassified.append(batch[j][0])
                    
                predicted_lables.append(plbl[j])        
                true_labels.append(batch[j][2]) 
            batch.clear()
            i = 0

Though I'm not getting the exact accuracy, but pretty close to it (out of 98.65 I get 98.61%! I don't know what is causing this difference !)

Update

The reason transformer.preprocess took too long to complete was because of its resize_image() method. resize_image needs the image to be in the form of of H,W,C, whereas the images have already been transposed and channelswapped (to the form of c,w,h) in my case (I was reading off an lmdb dataset), and this caused the resize_image() to resort to its slowest method of resizing the image thus taking 0.6 second for each image to be processed.

Now knowing this, transposing the image into the correct dimensions, would solve this issue. Meaning I had to do:

ims.append(transformer.preprocess('data',img.transpose(2,1,0)))

Note that, still it is slower than the above method, but it's much much faster than before!