Search code examples
pythonamazon-web-servicesaws-lambdaxgboostpackage-management

How to create any AWS Lambda Python Layer? (Usage example with XGBoost)


I am having trouble creating a lambda layer for the xgboost library. Im running:

Im grabbing a zip of xgboost and it's dependencies from here (https://github.com/alexeybutyrev/aws_lambda_xgboost) and loading it into a layer. When I try to test my lambda, I get this error:

Unable to import module 'lambda_function': No module named 'xgboost.core'

It looks like __init__.py is trying to reference core.py via from .core import <stuff>

Has anyone encountered this error with AWS Lambda before?


Solution

  • EDIT: As @Marcin has remark, the first answer provided works for packages under 262 MB large.

    A. Python Packages within Lambda Layer size limit

    You can also do it with AWS sam cli and Docker (see this link to install the SAM cli), to build the packages inside a container. Basically you initialize a default template with Python as runtime and then you specify the packages under the requirements.txt file. I found it more easy than the article you mentioned. I let you steps if you want to consider them for future use.

    1. Initialize a default SAM template

    Under any folder that you want to keep the project, you can type

    sam init
    

    this will prompt a series of questions, for a quick set up we will be choosing the Quick Start Templates as follows

    1 - AWS Quick Start Templates
    
    2 - Python 3.8
    
    Project name [sam-app]: your_project_name
    
    1 - Hello World Example
    

    By choosing the Hello World Example it generates a default lambda function with a requirements.txt file. Now, we're going to edit with the name of the package that you want, in this case xgboost

    2. Specify packages to install

    cd your_project_name
    code hello_world/requirements.txt
    

    as I have Visual Studio Code as editor, this will open the file on it. Now, I can specify the xgboost package

    your_python_package
    

    Here comes the reason to have Docker installed. Some packages relied on C++. Thus, it is recommended to build inside a container (case on Windows). Now, move to the folder where the template.yaml file is located. Then, type

    sam build -u
    

    3. Zip packages

    there are some files that you do not want to be included in your lambda layer, because we only want to keep the python libraries. Thus, you could remove the following files

    rm .aws-sam/build/HelloWorldFunction/app.py
    rm .aws-sam/build/HelloWorldFunction/__init__.py
    rm .aws-sam/build/HelloWorldFunction/requirements.txt
    

    and then zip the remaining content of the folder.

    cp -r .aws-sam/build/HelloWorldFunction/ python/
    zip -r my_layer.zip python/
    

    where we place the layer in the python/ folder according to the docs On Windows system the zip command should be replaced with Compress-Archive my_layer/ my_layer.zip.

    4. Upload your Layer to AWS

    On AWS go to Lambda, then choose Layers and Create Layer. Now, you can upload your .zip file as the image below shows

    enter image description here

    Notice that for zip files over 50 MB, you should upload the .zip file to an s3 bucket and provide the path, for exampl, https://s3:amazonaws.com//mybucket/my_layer.zip.

    B. Python packages that exceeds Lambda Layer limits

    The xgboost package on its own is more than 300 MB and will throw the following error

    enter image description here

    As @Marcin has kindly pointed out, the prior approach with SAM cli would not directly work for Python layers that exceed the limit. There's an open issue on github to specify a custom docker image when running sam build -u and a possible solution retagging the default lambda/lambci image.

    So, how could we pass through this?. There are already some useful resources that I would just point to.

    • First, the Medium article that @Alex took as solution that follow this repo code.
    • Second, alexeybutyrev approach that works by applying the strip command to reduce the libraries sizes. One can find this approach under a github repo, the instructions are provided.

    Edit (December 2020)

    This month AWS releases container Image support for AWS Lambda. Following the next tree structure for your project

    Project/
    |-- app/
    |   |-- app.py
    |   |-- requirements.txt
    |   |-- xgb_trained.bin
    |-- Dockerfile
     
    

    You can deploy an XGBoost model with the following Docker image. Follow this repo instructions for a detailed explanation.

    # Dockerfile based on https://docs.aws.amazon.com/lambda/latest/dg/images-create.html
    
    # Define global args
    ARG FUNCTION_DIR="/function"
    ARG RUNTIME_VERSION="3.6"
    
    # Choose buster image
    FROM python:${RUNTIME_VERSION}-buster as base-image
    
    # Install aws-lambda-cpp build dependencies
    RUN apt-get update && \
      apt-get install -y \
      g++ \
      make \
      cmake \
      unzip \
      libcurl4-openssl-dev \
      git
    
    
    # Include global arg in this stage of the build
    ARG FUNCTION_DIR
    # Create function directory
    RUN mkdir -p ${FUNCTION_DIR}
    
    # Copy function code
    COPY app/* ${FUNCTION_DIR}/
    
    # Install python dependencies and runtime interface client
    RUN python${RUNTIME_VERSION} -m pip install \
                       --target ${FUNCTION_DIR} \
                       --no-cache-dir \
                       awslambdaric \
                       -r ${FUNCTION_DIR}/requirements.txt
    
    # Install xgboost from source
    RUN git clone --recursive https://github.com/dmlc/xgboost
    RUN cd xgboost; make -j4; cd python-package; python${RUNTIME_VERSION} setup.py install; cd;
    
    # Multi-stage build: grab a fresh copy of the base image
    FROM base-image
    
    # Include global arg in this stage of the build
    ARG FUNCTION_DIR
    
    # Set working directory to function root directory
    WORKDIR ${FUNCTION_DIR}
    
    # Copy in the build image dependencies
    COPY --from=base-image ${FUNCTION_DIR} ${FUNCTION_DIR}
    
    ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
    
    CMD [ "app.handler" ]