Search code examples
pandasaws-lambdaaws-cloud9

Cloud9 deploy hitting size limit for numpy, pandas


I'm building in Cloud9 to deploy to Lambda. My function works fine in Cloud9 but when I go to deploy I get the error

Unzipped size must be smaller than 262144000 bytes

Running du -h | sort -h shows that my biggest offenders are:

  • /debug at 291M
  • /numpy at 79M
  • /pandas at 47M
  • /botocore at 41M

My function is extremely simple, it calls a service, uses panda to format the response, and sends it on.

  1. What is in debug and how do I slim it down/eliminate it from the deploy package?
  2. How do others use libraries at all if they eat up most of the memory limit?

Solution

  • A brief background to understand the problem root-cause

    The problem is not with your function but with the size of the zipped packages. As per AWS documentation, the overall size of zipped package must not exceed greater than 3MB. With that said, if the package size is greater than 3MB which happens inevitably, as a library can have many dependencies, then consider uploading the zipped package to a AWS S3 bucket. Note: even s3 bucket has a size limit of 262MB. Ensure that your package does not exceed this limit. The error message that you have posted, Unzipped size must be smaller than 262144000 bytes is referring to the size of the deployment package aka the libraries.

    Now, Understand some facts when working with AWS,

    1. AWS Containers are empty.
    2. AWS containers have a linux kernel
    3. AWS Cloud9 is only an IDE like RStudio or Pycharm. And it uses S3 bucket for saving the installed packages.

    This means you'll need to know the following:

    1. the package and its related dependencies

    2. extract the linux-compiled packages from cloud9 and save to a folder-structure like, python/lib/python3.6/site-packages/

    Possible/Workable solution to overcome this problem

    Overcome this problem by reducing the package size. See below.

    Reducing the deployment package size

    • Manual method: delete files and folders within each library folder that are named *.info and *._pycache. You'll need to manually look into each folder for the above file extensions to delete them.

    • Automatic method: I've to figure out the command. work in progress

    Use Layers

    In AWS go to Lambda and create a layer

    Attach the S3 bucket link containing the python package folder. Ensure the lambda function IAM role has permission to access S3 bucket.

    Make sure the un-zipped folder size is less than 262MB. Because if its >260 MB then it cannot be attached to AWS Layer. You'll get an error, Failed to create layer version: Unzipped size must be smaller than 262144000 bytes