I'm new to AWS and I'm trying to run Jellyfish on an AWS Lambda function with a python container image from an AWS base image.
I want to achieve this by installing the software from source in my python container image (from AWS base image) and then uploading it to AWS ECR to be later used by a Lambda function.
My AWS architecture is:
First, to install it from source I downloaded the latest release .tar.gz file locally, then uncompressed it, and copied the contents in the container.
My Dockerfile looks like this:
FROM public.ecr.aws/lambda/python:3.9
# Copy contents from latest release
WORKDIR /jellyfish-2.3.0
COPY ./jellyfish-2.3.0/ .
WORKDIR /
# Installing Jellyfish dependencies
RUN yum update -y
RUN yum install -y gcc-c++
RUN yum install -y make
# Jellyfish installation (in /bin)
RUN jellyfish-2.3.0/configure --prefix=/bin
RUN make -j 4
RUN make install
RUN chmod -R 777 /bin
# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
CMD [ "app.lambda_handler" ]
The installation folder is /bin because it's in $PATH so that way I can just run "jellyfish..." command.
My app.py looks like this:
import subprocess
import boto3
import logging
from pathlib import Path, PurePosixPath
s3 = boto3.client('s3')
print('Loading function')
def lambda_handler(event, context):
# Declare buckets & get name of file
bucket_in = "biome-test-bucket-out"
bucket_out = "biome-test-bucket-jf"
key = event["Records"][0]["s3"]["object"]["key"]
# Paths where files will be stored
input_file_path = f"/tmp/{key}"
file = Path(key).stem
output_file_path = f"/tmp/{file}.jf"
# Download file
with open(input_file_path, 'wb') as f:
s3.download_fileobj(bucket_in, key, f)
# Run jellyfish with downloaded file
command = f"/bin/jellyfish count -C -s 100M -m 20 -t 1 {input_file_path} -o {output_file_path}"
logging.info(subprocess.check_output(command, shell=True))
# Upload file to bucket
try:
with open(output_file_path, 'rb') as f:
p = PurePosixPath(output_file_path).name
s3.upload_fileobj(f, bucket_out, p)
except Exception as e:
logging.error(e)
return False
return 0
If I build and run the image locally, everything works fine but once the image runs in Lambda I get this error:
/usr/bin/ld: cannot open output file /bin/.libs/12-lt-jellyfish: Read-only file system
That file isn't there after the installation so I'm guessing that Jellyfish creates a new file in /bin/.libs/ when it's running and that file as only read permissions. I'm not sure how to tackle this, any ideas?
Thank you.
I found out that only /tmp is writable, and its contents are deleted when the app runs. I found a workaround by installing in /tmp, and moving the contents out and back in at app runtime.
Dockerfile:
WORKDIR /tmp
COPY ./jellyfish-2.3.0/ .
RUN ./configure --prefix=/tmp
RUN make -j 4
RUN make install
WORKDIR /jf
RUN shopt -s dotglob && mv /tmp/* /jf
WORKDIR /
RUN chmod -R 777 /jf
app.py:
# Move installation files back in tmp
shutil.copytree("/jf", "/tmp", copy_function=shutil.copy2,
dirs_exist_ok=True)
# Run jellyfish with downloaded file
blast_command = "/tmp/bin/jellyfish count -C -s 100M -m 20 -t 1 "
command = f"{blast_command}{input_file_path} -o {output_file_path}"
logging.info(subprocess.check_output(command, shell=True))