I am running inference using Python 2.7, MXNet V1.3.0 ML framework on an image classification model of ONNX format (V1.2.1 with opset 7) where I feed an image to the inferrer at a time. What do I need to do to asynchronously run inference for multiple images but also await for all of them to finish?
I am extracting frames as .jpeg images from a video at 30 FPS. So for an example, when I run the process on a video of length 20s, it generates 600 .jpeg images. For now, I iterate through a list of those images and pass a relative path to each of them to the following function which then infers from the target image.
def infer(self, target_image_path):
target_image_path = self.__output_directory + '/' + target_image_path
image_data = self.__get_image_data(target_image_path) # Get pixel data
'''Define the model's input'''
model_metadata = onnx_mxnet.get_model_metadata(self.__model)
data_names = [inputs[0]
for inputs in model_metadata.get('input_tensor_data')]
Batch = namedtuple('Batch', 'data')
ctx = mx.eia() # Set the context to elastic inference
'''Load the model'''
sym, arg, aux = onnx_mxnet.import_model(self.__model)
mod = mx.mod.Module(symbol=sym, data_names=data_names,
context=ctx, label_names=None)
mod.bind(data_shapes=[(data_names[0], image_data.shape)],
label_shapes=None, for_training=False)
mod.set_params(arg_params=arg, aux_params=aux,
allow_missing=True, allow_extra=True)
'''Run inference on the image'''
mod.forward(Batch([mx.nd.array(image_data)]))
predictions = mod.get_outputs()[0].asnumpy()
predictions = predictions[0].tolist()
'''Apply emotion labels'''
zipb_object = zip(self.__emotion_labels, predictions)
prediction_dictionary = dict(zipb_object)
return prediction_dictionary
Expected behavior would be to run the inference for each image asynchronously but also to await the process to finish for the entire batch.
One thing you shouldn't do is to load the model for every image. Should load the model once and then run inference on all your 600 images.
For example you can refactor your code like that:
def load_model(self):
'''Load the model'''
model_metadata = onnx_mxnet.get_model_metadata(self.__model)
data_names = [inputs[0]
for inputs in model_metadata.get('input_tensor_data')]
Batch = namedtuple('Batch', 'data')
ctx = mx.eia() # Set the context to elastic inference
'''Load the model'''
sym, arg, aux = onnx_mxnet.import_model(self.__model)
mod = mx.mod.Module(symbol=sym, data_names=data_names,
context=ctx, label_names=None)
mod.bind(data_shapes=[(data_names[0], image_data.shape)],
label_shapes=None, for_training=False)
mod.set_params(arg_params=arg, aux_params=aux,
allow_missing=True, allow_extra=True)
return mod
def infer(self, mod, target_image_path):
target_image_path = self.__output_directory + '/' + target_image_path
image_data = self.__get_image_data(target_image_path) # Get pixel data
'''Run inference on the image'''
mod.forward(Batch([mx.nd.array(image_data)]))
predictions = mod.get_outputs()[0].asnumpy()
predictions = predictions[0].tolist()
'''Apply emotion labels'''
zipb_object = zip(self.__emotion_labels, predictions)
prediction_dictionary = dict(zipb_object)
return prediction_dictionary
MXNet runs on a asynchronous engine, you don't have to wait for the image to finish processing to enqueue a new one.
Some calls in MXNet are asynchronous, for example when you call mod.forward()
this calls returns immediately and does not wait for the result to be computed. Other calls are synchronous, for example mod.get_outputs()[0].asnumpy()
this copies the data to the CPU so it has to be synchronous. Having a synchronous call between each of your iteration slows down the processing a bit.
Assuming you have access to the list of image_paths, you can process them like this to minimize waiting time and having a synchronization point only at the end:
results = []
for target_image_path in image_paths:
image_data = self.__get_image_data(target_image_path) # Get pixel data
'''Run inference on the image'''
mod.forward(Batch([mx.nd.array(image_data)]))
results.append(mod.get_outputs()[0])
predictions = [result.asnumpy()[0].tolist() for result in results]
You can read more about asynchronous programming with MXNet here: http://d2l.ai/chapter_computational-performance/async-computation.html
Even better if you know that you have N images to process, you can batch them into batches of for example 16 in order to increase the parallelism of the processing. However doing so will increase the memory consumption. Since you seem to be using an elastic inference context, your overall memory will be limited and I would advise sticking with smaller batch size to not risk to run out of memory.