python-3.x scikit-learn aws-lambda scikit-image

Build custom AWS Lambda layer for Scikit-image

Outline: I need to use scikit-image inside some AWS lambda functions, so I'm looking to build a custom AWS lambda layer containing scikit-image.

My questions in general should apply to any python module, notably scikit-learn, or any custom layer in general I think.

Background: After much googling and reading it seems the best way to do that is to use docker to run the AWS lambda runtime locally, and then inside there install/compile scikit-image (or whichever module you're looking for). After that's done, you can upload/install it to AWS as a custom layer.

This is conceptually pretty simple, but I'm struggling a bit with best-practices way to do this. I've got this working, but not sure I'm doing it the best/right/optimal/secure way ... there are million all-slightly-different blog posts about this, and the AWS docs themselves are (IMHO) too detailed but skip over some of the basic questions.

I've been trying to basically follow two good medium posts, here and here ...kudos to those guys.

My main questions are:

Where is the best place to find the latest AWS AMI docker image?

There are multiple (even on amazon itself) multiple locations/versions etc for what is supposedly the latest image. eg https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html, or https://cdn.amazonlinux.com/os-images/2.0.20190823.1/.

..This is ignoring the multitude of non-amazon github hosted possibilities, such as lambci/lambda:build-python3.6 from medium posts here, or onema/amazonlinux4lambda from here.

I'd prefer to use an amazon provided docker image, for both security and up-to-date'ness.

Is the AWS lambda runtime here, which links to this AMI, a docker image? If so (or not) how do you download it to run it locally?
How do you ensure you know when you might need to rebuild a layer, because the AWS lambda runtime is changed by amazon and that breaks you're layer using an older runtime?
Is it better to build (compile in the case of scikit-image) the pip installed module inside of the docker AIM container, or simply just to tell pip to download the pre-built version and hope/trust it will get the compiled libs that are the best for the AMI you're running?

Basically here I'm concerned about stability and performance. I'd like to ensure that the compiled libraries for scikit-image in this case are as optimized as possible for the AMI container.

Is it better to just download and use AWS's SAM to do all of this? (looks like overkill and complicated, but it does look like it takes care of ensuring you're using the 'correct' AMI docker container all the time)
Are there any (good, trustable) repo's of pre-built lambda layers around (that might make all this a moot point)? I looked but couldn't find any.

...thanks for any advice, thoughts and comments!

Solution

As of v0.50.0, the sam cli has direct support for building layers. You decorate your AWS::Serverless::LayerVersion resource with metadata about which runtime strategy to use.

MyLayer:
 Type: AWS::Serverless::LayerVersion
 Properties:
   Description: Layer description
   ContentUri: 'my_layer/'
   CompatibleRuntimes:
    - python3.8
 Metadata:
   BuildMethod: python3.8