Search code examples
pytorchtorchvisiontorchserve

What is the preferred way to load images from s3 into torch serve for inference?


I have an image classifier model that I plan to deploy via torch serve. My question is, what is the ideal way to load as well write images from / to s3 buckets instead of from local filesystem for inference. Should this logic live in the model handler file? Or should it be a separate worker that sends images to the inference endpoint, like this example, and the resulting image is piped into an aws cp command for instance?


Solution

  • There is no one silver bullet here or clean code approach we should follow.

    It depends on your specific requirements. But Let's go over each scenario you just described.

    I. S3 Simple R/W with Model Handler:

    The advantages that I see :Simplicity in regards of workflow and everything is centralized : No need to manage aditional services. Everything is the Handler. This seems like a quick way to build a POC.

    Disadvantages: Scalability for high volume of images. Tight Coupling maybe part of the upload needs to be captured by some other component in the future.

    Code example :

    class ImageClassifierHandler(BaseHandler):
        def initialize(self, context):
            self.s3_client = boto3.client('s3')
            self.model = self.load_model()
            self.initialized = True
    
    def load_image_from_s3(self, bucket_name, object_key):
        response = self.s3_client.get_object(Bucket=bucket_name, Key=object_key)
        image_data = response['Body'].read()
        image = Image.open(io.BytesIO(image_data)).convert('RGB')
        return image
    
    def handle(self, data, context):
        bucket_name = data[0].get('bucket_name')
        object_key = data[0].get('object_key')
    
        image = self.load_image_from_s3(bucket_name, object_key)
        input_tensor = self.preprocess_input(image)
    
        with torch.no_grad():
            output = self.model(input_tensor)
        
        _, predicted = torch.max(output, 1)
        prediction = predicted.item()
    
        # Optionally upload the result back to S3
        result_key = f"results/{object_key.split('/')[-1].replace('.jpg', '_result.txt')}"
        self.s3_client.put_object(Bucket=bucket_name, Key=result_key, Body=str(prediction))
    
        return [{"prediction": prediction, "result_s3_path": f"s3://{bucket_name}/{result_key}"}]
    

    II. Using a Seperate Worker to handle S3 Operations.

    This looks more elegant right but you introduce complexity you have to maintain. The exact advantages and disadavantages from the previous method are here in reverse.

    This means the advantages of this method is scalability and if done well modularity to reuse this r/w to your s3 by other projects/models. The disadvantages are obviusly : complexity, a new service to maintain a new potential failure point.

    Are you in an enterprise environment where you might reuse this later and not just a proof of concept/ learning/academic environment then go with this option.

    You need a torchserve endpoint for the image classifier let's say

    torchserve_endpoint = "http://localhost:8080/predictions/image-classifier"
    

    Then you can do something like this

    def fetch_image_from_s3(bucket_name, object_key):
    response = s3_client.get_object(Bucket=bucket_name, Key=object_key)
    image_data = response['Body'].read()
    return Image.open(io.BytesIO(image_data)).convert('RGB')
    
    def send_image_for_inference(image):
    buffered = io.BytesIO()
    image.save(buffered, format="JPEG")
    buffered.seek(0)
    
    files = {'data': buffered}
    response = requests.post(torchserve_endpoint, files=files)
    return response.json()
    
    def upload_result_to_s3(bucket_name, object_key, result):
    result_key = f"results/{object_key.split('/')[-1].replace('.jpg', '_result.txt')}"
    s3_client.put_object(Bucket=bucket_name, Key=result_key, Body=str(result))
    return f"s3://{bucket_name}/{result_key}"
    

    Now of course the second one looks approachable but believe me when I say whenever you add another layer of complexity unnecessarily it will bite you back in the ass. The second option is not as simple when you consider managing the service having a separate inference and having multiple aws resources.

    If your business does not require this kind of arhitecture just go with the first option.