Search code examples
pythonamazon-web-servicesdockeraws-lambdadockerfile

Running third-party command line software on AWS Lambda container image


Introduction

I'm new to AWS and I'm trying to run Jellyfish on an AWS Lambda function with a python container image from an AWS base image.

Context

I want to achieve this by installing the software from source in my python container image (from AWS base image) and then uploading it to AWS ECR to be later used by a Lambda function.

My AWS architecture is:

  • biome-test-bucket-out (AWS S3 bucket with trigger) -> Lambda function -> biome-test-bucket-jf (AWS S3 bucket)

First, to install it from source I downloaded the latest release .tar.gz file locally, then uncompressed it, and copied the contents in the container.

My Dockerfile looks like this:

FROM public.ecr.aws/lambda/python:3.9

# Copy contents from latest release
WORKDIR /jellyfish-2.3.0 
COPY ./jellyfish-2.3.0/ .
WORKDIR /

# Installing Jellyfish dependencies
RUN yum update -y
RUN yum install -y gcc-c++
RUN yum install -y make

# Jellyfish installation (in /bin)
RUN jellyfish-2.3.0/configure --prefix=/bin
RUN make -j 4
RUN make install

RUN chmod -R 777 /bin

# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
CMD [ "app.lambda_handler" ]

The installation folder is /bin because it's in $PATH so that way I can just run "jellyfish..." command.

My app.py looks like this:

import subprocess
import boto3
import logging
from pathlib import Path, PurePosixPath

s3 = boto3.client('s3')

print('Loading function')


def lambda_handler(event, context):
    
    # Declare buckets & get name of file
    bucket_in = "biome-test-bucket-out"
    bucket_out = "biome-test-bucket-jf"
    key = event["Records"][0]["s3"]["object"]["key"]
    
    # Paths where files will be stored
    input_file_path = f"/tmp/{key}"
    file = Path(key).stem
    output_file_path = f"/tmp/{file}.jf"
    
    # Download file
    with open(input_file_path, 'wb') as f:
      s3.download_fileobj(bucket_in, key, f)

    # Run jellyfish with downloaded file
    command = f"/bin/jellyfish count -C -s 100M -m 20 -t 1 {input_file_path} -o {output_file_path}"
    logging.info(subprocess.check_output(command, shell=True))
    
    # Upload file to bucket
    try:
      with open(output_file_path, 'rb') as f:
        p = PurePosixPath(output_file_path).name
        s3.upload_fileobj(f, bucket_out, p)
        
    except Exception as e:
      logging.error(e)
      return False
    
    return 0


Problem

If I build and run the image locally, everything works fine but once the image runs in Lambda I get this error:

/usr/bin/ld: cannot open output file /bin/.libs/12-lt-jellyfish: Read-only file system

That file isn't there after the installation so I'm guessing that Jellyfish creates a new file in /bin/.libs/ when it's running and that file as only read permissions. I'm not sure how to tackle this, any ideas?

Thank you.


Solution

  • I found out that only /tmp is writable, and its contents are deleted when the app runs. I found a workaround by installing in /tmp, and moving the contents out and back in at app runtime.

    Dockerfile:

    WORKDIR /tmp 
    COPY ./jellyfish-2.3.0/ .
    RUN ./configure --prefix=/tmp
    RUN make -j 4
    RUN make install
    
    WORKDIR /jf
    RUN shopt -s dotglob && mv /tmp/* /jf
    WORKDIR /
    RUN chmod -R 777 /jf
    

    app.py:

    # Move installation files back in tmp
    shutil.copytree("/jf", "/tmp", copy_function=shutil.copy2,
                    dirs_exist_ok=True)
    
    # Run jellyfish with downloaded file
    blast_command = "/tmp/bin/jellyfish count -C -s 100M -m 20 -t 1 "
    command = f"{blast_command}{input_file_path} -o {output_file_path}"
    logging.info(subprocess.check_output(command, shell=True))