Sagemaker image classification: Best way to perform inference on many images in S3?

I trained a model with the built-in RESnet18 docker image, and now I want to deploy the model to an endpoint and classify ~ 1 million images. I have all my training, validation, and test images stored on S3 in RecordIO format (converted with im2rec.py). According to the docs:

The Amazon SageMaker Image Classification algorithm supports both RecordIO (application/x-recordio) and image (application/x-image) content types for training. The algorithm supports only application/x-image for inference.

So I cannot perform inference on my training data in RecordIO format. To overcome this I copied all the raw .jpg images (~ 2GB) onto my Sagemaker Jupyter Notebook instance and performed inference one at a time in the following way:

img_list = os.listdir('temp_data') # list of all ~1,000,000 images

for im in img_list:
    with open('temp_data/'+im, 'rb') as f:
        payload = f.read()
        payload = bytearray(payload)
    response = runtime.invoke_endpoint(EndpointName=endpoint_name, 
                                       ContentType='application/x-image', 
                                       Body=payload)

    etc...

Needless to say, transferring all the data onto my Notebook instance took a long time and I would prefer not having to do that before running inference. Why does the SageMaker Image Classification not support RecordIO for inference? And more importantly, what is the best way to run inference on many images without having to move them from S3?

Solution

The RecordIO format is designed to pack a large number of images into a single file, so I don't think it would work well for predicting single images.

When it comes to prediction, you definitely don't have to copy images to a notebook instance or to S3. You just have to load them from anywhere and inline them in your prediction requests.

If you want HTTP-based prediction, here are your options:

Use the SageMaker SDK Predictor.predict() API on any machine (as long as it has proper AWS credentials) https://github.com/aws/sagemaker-python-sdk
Use the AWS Python SDK (aka boto3) API invoke_endpoint() on any machine (as long as it has proper AWS credentials)

You can even build a simple service to perform pre-processing or post-processing with Lambda. Here's an example: https://medium.com/@julsimon/using-chalice-to-serve-sagemaker-predictions-a2015c02b033

If you want batch prediction: the simplest way is to retrieve the trained model from SageMaker, write a few lines of ad-hoc MXNet code to load it and run all your predictions. Here's an example: https://mxnet.incubator.apache.org/tutorials/python/predict_image.html

Hope this helps.